Anthropic Secures Fair Use Win for AI Training, But Faces High-Stakes Piracy Trial
In the rapidly evolving landscape of artificial intelligence, legal battles over intellectual property rights are becoming increasingly common and consequential. At the heart of many of these disputes lies the question of whether training AI models on vast datasets, which often include copyrighted material, constitutes infringement. A recent ruling in a case involving AI startup Anthropic has delivered a complex answer, offering a significant victory on one front while leaving the company vulnerable to potentially crippling damages on another.
Anthropic, a prominent player in the generative AI space known for its Claude models, has been embroiled in a class action lawsuit brought by a group of authors. The core of the authors' complaint alleged that Anthropic violated their copyrights by using their books without permission to train its AI systems. This case, Bartz v. Anthropic, filed in the US District Court for the Northern District of California in August 2024, is one of many similar lawsuits currently navigating the US legal system, making its outcome closely watched by the entire industry.
A Landmark Fair Use Determination
In a summary judgment order released late Monday evening, Senior District Judge William Alsup delivered a pivotal ruling on the 'fair use' defense raised by Anthropic. Fair use is a doctrine in US copyright law that permits limited use of copyrighted material without permission from the copyright holder for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. Courts typically evaluate fair use based on four factors: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market for or value of the copyrighted work.
Judge Alsup, who has notable experience with complex technology and copyright issues, having presided over the initial trial in the landmark Google v. Oracle case, found that Anthropic's use of copyrighted works for the purpose of training its AI models qualified as fair use. His reasoning hinged significantly on the concept of 'transformative use,' which asks whether the new work adds something new, with a further purpose or different character, and does not substitute for the original work.
"The training use was a fair use," Judge Alsup wrote in his order. He emphasized the groundbreaking nature of the technology involved, stating, "The technology at issue was among the most transformative many of us will see in our lifetimes." This finding suggests that the process of an AI model learning patterns, relationships, and information from copyrighted text to build a predictive language model is fundamentally different in character and purpose from simply reproducing or distributing the original works.
Intellectual property law experts were quick to highlight the significance of this aspect of the ruling. Chris Mammen, a managing partner at Womble Bond Dickinson specializing in IP law, noted, "This is the first major ruling in a generative AI copyright case to address fair use in detail." He added that Judge Alsup's finding that training an LLM is transformative use, even when some memorization occurs, is particularly impactful. "He specifically rejected the argument that what humans do when reading and memorizing is different in kind from what computers do when training an LLM," Mammen explained.
This fair use determination for AI training stands in contrast to one of the few other summary judgment decisions in this nascent area of law. In Thomson Reuters v. Ross, a judge found that the AI startup Ross's training on materials from the legal research firm Westlaw (owned by Thomson Reuters) was not fair use. However, that case is already on appeal, and Judge Alsup's detailed reasoning in the Anthropic case provides a powerful counterpoint that fair use advocates are likely to leverage in other ongoing lawsuits.
Adam Eisgrau, senior director of AI, Creativity, and Copyright Policy at the tech trade group Chamber of Progress, lauded the decision, stating, "Judge Alsup’s ruling should be a model for other courts assessing whether Gen AI training on copyrighted material is fair use." He argued that the ruling correctly identifies AI training as transformative and aligns with the purpose of copyright law to promote creativity and competition, rather than protecting existing revenue streams from disruption by new technologies.
For AI companies facing numerous copyright challenges, this part of the ruling offers a potential legal shield for the fundamental process of training their models on large datasets, even those containing copyrighted works. It suggests that courts may view the *process* of learning and generating new outputs as distinct enough from the original input material to satisfy the transformative use requirement of fair use.
The Damocles Sword of Piracy
Despite the significant win on the fair use argument for training, the ruling is far from a complete victory for Anthropic. Judge Alsup's order drew a sharp distinction between the act of training the AI model and the method by which Anthropic acquired and maintained the underlying dataset. While training might be fair use, the acquisition and possession of pirated copies of books were not.
During the discovery phase of the lawsuit, it came to light that Anthropic had relied heavily on downloading vast quantities of books from known pirated databases to build its initial training library. Judge Alsup's order provides specific details unearthed during discovery, including actions taken by Anthropic cofounder Ben Mann.
According to the ruling, Mann downloaded the entirety of a pirated database known as Books3 in the winter of 2021. Anthropic's acquisition of pirated materials didn't stop there. In June 2021, Mann downloaded at least five million copies of books from Library Genesis, or LibGen, a widely known source of pirated academic papers and books. Judge Alsup noted that Mann was aware these copies had been pirated. The following year, in July 2022, Anthropic further downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, again with Anthropic's knowledge that the material was pirated.
In total, Anthropic amassed a library containing over seven million pirated copies of books. Although Anthropic eventually shifted towards training on legitimately purchased copies, the court found that the initial act of downloading, collecting, and maintaining this enormous library of pirated materials was not protected by fair use. "The downloaded pirated copies used to build a central library were not justified by a fair use," Alsup wrote, adding that "Every factor points against fair use" for this specific activity.
This finding aligns with arguments made in other AI copyright lawsuits. For instance, in Kadrey v. Meta, another case brought by authors against a major AI company, plaintiffs have similarly focused on Meta's alleged acquisition of books from pirate libraries like LibGen, arguing that such actions are illegal and not covered by fair use.
The consequence of this finding is severe. Judge Alsup ruled that the authors could take Anthropic to trial specifically over the claims related to the pirated copies used to create the central library and the resulting damages. Under US copyright law, statutory damages for infringement can range from $750 to $30,000 per work infringed, and up to $150,000 per work for willful infringement. Given that Anthropic's pirated library consisted of at least 7 million books, the potential financial exposure for the company is enormous, potentially reaching into the billions of dollars.
"We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages," the order concluded. While no trial date has been set yet, the prospect of facing billions in potential penalties looms large over Anthropic.

Implications for the AI Industry and Creators
The split ruling in Bartz v. Anthropic sends a complex but crucial message to the artificial intelligence industry. On one hand, it provides the first significant judicial endorsement of the argument that training AI models on copyrighted data can be considered fair use, a position long advocated by AI developers. This could potentially simplify one aspect of the legal challenges facing companies building large language models and other generative AI systems.
However, the ruling simultaneously delivers a stark warning regarding the provenance of training data. The court's willingness to allow a trial specifically on the acquisition and possession of pirated materials makes it clear that how AI companies obtain their data is just as critical, if not more so, than how they use it for training. Relying on or even possessing data known to be pirated exposes companies to significant legal and financial risks, regardless of whether the subsequent training is deemed fair use.
This puts immense pressure on AI companies to ensure their training datasets are sourced legally, either through licensing agreements, purchasing data, or using public domain materials. The potential for billions in statutory damages serves as a powerful deterrent against cutting corners on data acquisition.
For authors and other creators whose works are used in AI training data, the ruling presents a mixed outcome. The fair use finding for training itself might be disappointing to those hoping for a clear requirement for licensing or permission for every work used. However, the court's strong stance against the use of pirated libraries validates a key concern of creators and opens a path for accountability and potential compensation when AI companies have relied on illegally obtained materials.
The case highlights the ongoing tension between technological innovation and existing copyright frameworks. While courts are beginning to grapple with applying doctrines like fair use to novel AI processes, the fundamental principles against piracy remain robust. The distinction drawn by Judge Alsup between the transformative *use* in training and the illegal *acquisition* of data is likely to influence how future AI copyright cases are argued and decided.
The Road Ahead: Trial on Piracy and Damages
The focus of the Bartz v. Anthropic case now shifts to the upcoming trial. This phase will not revisit the fair use argument for training but will concentrate specifically on the claims related to the pirated library. Key issues at trial will likely include:
- The exact number of copyrighted works within the pirated library.
- Whether Anthropic's actions constituted willful infringement, which could significantly increase statutory damages.
- Arguments around the actual damages suffered by the authors, although statutory damages provide a path for recovery even if actual damages are hard to prove.
- Anthropic's defenses specifically related to the piracy claims, if any remain after the summary judgment.
The outcome of this trial will have significant ramifications, not just for Anthropic but for the entire AI industry. A large damages award could set a precedent for the cost of using illegally sourced data and reinforce the need for rigorous data provenance practices. Conversely, a lower-than-feared award might influence future settlement negotiations in similar cases.
Beyond this specific case, the legal landscape for AI and copyright is still being shaped. Lawsuits involving other AI companies like Meta and OpenAI continue to progress, each potentially adding new layers of interpretation to existing law. The Anthropic ruling provides a crucial piece of the puzzle, suggesting that while the *process* of training might find shelter under fair use, the *source* of the data used for that training will face intense scrutiny.
Ultimately, this split decision underscores the complex legal tightrope AI companies must walk. Innovating with transformative technology requires vast amounts of data, but acquiring that data must be done through legitimate channels. The Anthropic case serves as a powerful reminder that a victory on the technical use of data does not excuse illegal methods of obtaining it, and the cost of piracy in the age of AI could be astronomical.
Anthropic spokesperson Jennifer Martinez expressed satisfaction with the fair use aspect of the ruling, telling WIRED that the decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress." Lawyers for the plaintiffs declined to comment on the ruling's specifics ahead of the upcoming trial focused on the piracy claims.
As the case moves towards trial, the industry and creators alike will be watching closely to see how the court addresses the serious allegations of mass piracy and what financial consequences Anthropic may ultimately face for the origins of its training data library.
The intersection of AI and copyright law remains a dynamic and uncertain area, with each new ruling adding clarity while also raising new questions. The Anthropic decision is a significant step, affirming the potential for fair use in AI training while simultaneously drawing a firm line against the use of pirated materials, a distinction that will undoubtedly shape the future development and deployment of artificial intelligence technologies.