The rapid evolution of artificial intelligence (AI) technologies has transformed industries, from healthcare to creative arts, but it has also sparked intricate legal debates over the use of copyrighted materials to train AI models. At the core of these disputes is the doctrine of fair use under U.S. copyright law, which seeks to balance the rights of creators with the public’s interest in innovation. As AI systems rely on vast datasets, often including copyrighted works, courts have begun to clarify whether such use qualifies as fair use. Recent judicial decisions, including high-profile cases involving Anthropic, Meta Platforms, and Thomson Reuters, provide critical guidance while highlighting the complexities of data sourcing and market impact. This article explores the application of fair use in AI training, drawing on specific cases with detailed citations to illustrate judicial perspectives and their implications for the AI industry.
Fair use, as codified in Section 107 of the U.S. Copyright Act (17 U.S.C. § 107), permits limited use of copyrighted material without permission under specific conditions. Courts evaluate fair use through a four-factor test: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the market value of the original work. In AI training cases, the transformative nature of the use—whether it produces novel outputs rather than replicating the original—is often central. The legality of data acquisition and the economic impact on the original work’s market are also pivotal. These factors have been rigorously tested in recent litigation, revealing the opportunities and challenges of applying fair use to AI development.
A landmark case, Bartz v. Anthropic PBC, No. 3:24-cv-03811-JSC (N.D. Cal., filed Aug. 15, 2024), addressed Anthropic’s use of copyrighted books to train its language model, Claude. On June 24, 2025, Judge William H. Alsup ruled that Anthropic’s training process constituted fair use, emphasizing its transformative nature (Bartz v. Anthropic PBC, No. 3:24-cv-03811-JSC, slip op. at 12 (N.D. Cal. June 24, 2025)). The court compared AI training to a human learning from books to create new works, noting that Claude’s outputs did not reproduce the original texts. However, the court found that Anthropic’s downloading of over seven million pirated books from unauthorized sources, such as Library Genesis, was not protected by fair use, ordering a separate trial to assess potential damages for this infringement. This ruling underscores the distinction between transformative training and unethical data sourcing, highlighting that illegal data acquisition can jeopardize a fair use defense.
In Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-VC (N.D. Cal., filed July 7, 2023), Judge Vince Chhabria issued a significant ruling on June 25, 2025, finding that Meta’s use of copyrighted books to train its Llama large language model constituted fair use (Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-VC, slip op. at 8 (N.D. Cal. June 25, 2025)). The plaintiffs, including authors Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey, alleged that Meta infringed their copyrights by using their works, often obtained through pirated sources like Library Genesis, without permission. The court determined that the plaintiffs failed to provide sufficient evidence that Meta’s AI outputs harmed the market for their original works, a key factor in the fair use analysis. The transformative nature of Llama’s outputs, which did not replicate the original texts but used them to generate novel content, supported Meta’s defense. However, Judge Chhabria cautioned that the ruling was specific to the case’s evidence, emphasizing that Meta’s reliance on pirated materials could lead to liability in other contexts. He noted, “This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful,” indicating that fair use depends heavily on case-specific details. Earlier proceedings in the case, including a May 1, 2025, hearing, revealed judicial skepticism about Meta’s data practices, with Chhabria expressing concern that AI could “obliterate” markets for original works if not carefully regulated (Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-VC, hearing transcript, May 1, 2025). The court also allowed the plaintiffs to amend their complaint to include claims under the Digital Millennium Copyright Act (DMCA) for Meta’s alleged removal of copyright management information (CMI) from the training data, further highlighting the legal risks of unethical data practices.
In contrast, Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence, Inc., No. 1:20-cv-00613-SB (D. Del., filed May 6, 2020), resulted in a ruling against fair use. On February 11, 2025, Judge Stephanos Bibas, sitting by designation in the U.S. District Court for the District of Delaware, granted partial summary judgment to Thomson Reuters, finding that ROSS Intelligence’s use of copyrighted Westlaw headnotes to train its AI legal research tool was not transformative (Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence, Inc., No. 1:20-cv-00613-SB, slip op. at 18-20(D. Del. Feb. 11, 2025)). The court determined that ROSS’s tool directly competed with Westlaw, harming its market, and rejected the fair use defense as a matter of law. This case, involving non-generative AI, illustrates that uses designed to substitute for the original work are less likely to qualify as fair use, particularly when they impact the copyright holder’s market.
The precedent set by Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015), decided on October 16, 2015, remains influential. The Second Circuit upheld Google’s scanning of millions of books for its Google Books project as fair use, citing the transformative nature of creating a searchable database that provided snippets rather than full texts. The court found no harm to the market for the original works, as the project enhanced their discoverability. This ruling has been cited in AI cases like Bartz v. Anthropic to support transformative uses that add new value.
Ongoing litigation, such as The New York Times Co. v. Microsoft Corp. and OpenAI, Inc., No. 1:23-cv-11195 (S.D.N.Y., filed Dec. 27, 2023), continues to test fair use boundaries. The New York Times alleges that OpenAI and Microsoft used its copyrighted articles to train ChatGPT, potentially competing with its journalism. As of June 25, 2025, no summary judgment has been issued, but the case raises similar concerns about data sourcing and market harm as seen in Kadrey v. Meta and Bartz v. Anthropic. Similarly, Getty Images, Inc. v. Stability AI, Inc., No. 1:23-cv-00135-JLH (D. Del., filed Feb. 3, 2023), examines whether Stability AI’s use of over 12 million copyrighted images to train its image-generation model harmed Getty’s licensing market. Preliminary arguments suggest that direct competition with the original works could preclude fair use, but the case remains unresolved.
Ethical data acquisition remains a critical issue. Courts have consistently penalized the use of pirated materials, as evidenced inBartz v. Anthropic and Kadrey v. Meta. In Concord Music Group, Inc. v. Anthropic PBC, No. 3:24-cv-03811-JSC (N.D. Cal., transferred June 26, 2024, from M.D. Tenn., No. 3:23-cv-01092), music publishers alleged that Anthropic scraped copyrighted song lyrics to train Claude, citing the Thomson Reuters decision to argue against fair use. These cases emphasize that ethical data practices are both a moral and legal necessity for asserting fair use.
The economic impact of AI training on copyright holders is increasingly significant. The Thomson Reuters case highlighted AI’s potential to disrupt licensing markets, a concern echoed in Getty Images v. Stability AI. Emerging licensing deals, such as those between Reuters and AI companies, suggest a growing market for training data, which courts may consider when evaluating market harm. AI developers can mitigate legal risks by pursuing licensing agreements or using public domain materials, as reliance on fair use alone may not suffice in cases of direct competition.
The legal framework governing AI training is evolving, with fair use doctrine struggling to address the scale of modern AI systems. Legislative proposals, such as amendments to the U.S. Copyright Act, aim to clarify data use in AI contexts, while international frameworks like the European Union’s Artificial Intelligence Act (Regulation (EU) 2024/1689, effective Aug. 1, 2024) impose transparency requirements for training data. The Bartz v. Anthropicand Kadrey v. Meta rulings, as the first major generative AI cases to address fair use, set precedents for transformative uses but warn against unethical data practices.
Navigating fair use in AI training requires balancing legal, ethical, and economic considerations. The Bartz v. Anthropic, Kadrey v. Meta, and Thomson Reuters v. ROSS Intelligence cases demonstrate that transformative uses are more likely to be protected, but illegal data acquisition and market harm can negate fair use defenses. Ongoing litigation, including The New York Times v. OpenAI and Getty Images v. Stability AI, will further refine these principles. AI developers must adopt transparent data practices and assess market impacts to strengthen fair use claims, while collaboration among technologists, policymakers, and copyright holders is essential to foster innovation while respecting intellectual property rights. As the legal landscape evolves, these cases serve as critical guideposts for ensuring AI development aligns with the principles of fairness and creativity.

