Table of Contents
Quick Answer
AI copyright law in 2026 is being shaped by hundreds of pending lawsuits worldwide. Training on copyrighted works without licence is permitted by narrow exceptions (US fair use, EU Text and Data Mining, UK Section 29A) but outputs substantially similar to training works remain infringing.
- Training is NOT automatically infringement — and it is NOT automatically fair use
- Generative outputs can infringe if substantially similar to protected works
- The US Copyright Office holds that purely AI-generated works lack human authorship
What Is the AI Copyright Landscape?
Copyright questions in AI span three stages:
- Input (training data collection and use)
- Model (can a trained model itself infringe?)
- Output (is a generated work a derivative?)
Key authorities are the US Copyright Office (Reports on Copyright and AI, Part 1 March 2024, Part 2 January 2025, Part 3 May 2025), the UK IPO, the EU Copyright Directive Articles 3 and 4 (TDM exceptions), and Japan's Article 30-4 of the Copyright Act.
Key Details / Requirements
Major Pending Lawsuits (Selected)
| Case | Plaintiffs | Defendants | Filed | Core Issue |
|---|---|---|---|---|
| New York Times v. OpenAI & Microsoft | NYT | OpenAI, Microsoft | Dec 2023 | Training and verbatim memorisation |
| Andersen v. Stability AI | Artists | Stability AI | 2023 | Training on artworks |
| Getty Images v. Stability AI (US + UK) | Getty | Stability AI | 2023 | Training on Getty library |
| Authors Guild v. OpenAI | Authors | OpenAI | 2023 | Novels in training data |
| Concord Music v. Anthropic | Publishers | Anthropic | 2023 | Song lyrics |
| Bartz v. Anthropic | Authors | Anthropic | 2024 | Books in training (settled September 2025 for USD 1.5B) |
Global TDM and Fair-Use Regimes
| Jurisdiction | Rule | Opt-Out Allowed? |
|---|---|---|
| USA | Fair use (17 USC 107) | N/A |
| EU | Copyright Directive Art. 3 (research) and Art. 4 (commercial) | Yes for Art. 4 via machine-readable opt-out |
| UK | Sec 29A CDPA (non-commercial TDM only) | N/A |
| Japan | Art. 30-4 Copyright Act (non-enjoyment exception) | No |
| Singapore | Computational Data Analysis (Sec 244 Copyright Act 2021) | No |
Real-World Examples / Case Studies
Bartz v. Anthropic (2025) — The first major AI training settlement: USD 1.5 billion class-action settlement over books used in training, though Judge Alsup had ruled earlier that training itself was transformative fair use when done on lawfully acquired copies.
New York Times v. OpenAI (ongoing) — Federal complaint alleges GPT-4 reproduces Times articles verbatim and competes with the Times' own business.
Stability AI (UK) — Getty Images High Court trial concluded in 2025 with a partial win for Getty on trademark grounds.
US Copyright Office Zarya of the Dawn (2023) — Comic authored by Kris Kashtanova; text and arrangement protected, but Midjourney-generated images denied registration.
What This Means for AI Teams
In 2026, AI teams must:
- License training data whenever practical (Getty, Shutterstock, Reuters have all signed licensing deals)
- Implement training-data provenance records (per EU AI Act Art. 53(1)(c))
- Respect robots.txt signals and TDM opt-outs (EU Copyright Directive)
- Add output filters for memorisation and near-duplicate generation
- Indemnify customers against third-party copyright claims (as Adobe, Microsoft, Google, OpenAI now do for enterprise customers)
Compliance Checklist
- Publish a training-data sources document
- Honour machine-readable opt-outs (robots.txt, TDM Reservation Protocol, C2PA)
- License copyrighted datasets where feasible
- Build memorisation tests into evaluation pipelines
- Offer customer IP indemnification where commercially appropriate
- For deployers: record prompts and outputs to demonstrate non-infringement
- Track ongoing cases and US Copyright Office guidance
Conclusion
AI copyright is the most unsettled area of AI law. Teams that document provenance, license data, and indemnify customers will weather the lawsuits best.
Audit your training data with Misar AI's copyright provenance toolkit.
