EPISODE · Jul 18, 2025 · 16 MIN
Anthropic, Copyright, and the Fair Use Divide
from The Briefing by Weintraub Tobin · host Weintraub Tobin
A federal judge has ruled that training Claude AI on copyrighted books—even without a license—was transformative and protected under fair use. But storing millions of pirated books in a permanent internal library? That crossed the line. In this episode of The Briefing, Scott Hervey and Tara Sattler break down this nuanced opinion and what this ruling means for AI developers and copyright owners going forward. Watch this episode on YouTube. Show Notes: Scott: What happens when an artificial intelligence company trains its models on millions of books? Some purchased, some pirated. In a closely watched ruling, a federal judge held that training the AI was fair use, likening the process to how a human learns by reading. But keeping pirated copies of those books in a permanent digital library, well, that crossed the line. I’m Scott Hervey, a partner with the law firm of Weintraub Tobin, and I’m joined today by my partner and frequent Briefing contributor, Tara Sattler. We are going to break down the recent fair use ruling in the lawsuit over Claude AI, that’s Anthropic’s AI, and explore what it means for the future of AI training on today’s installment of the briefing. Tara, welcome back to The Briefing. Good to have you. Tara: Thanks, Scott. I always enjoy being here with you. Scott: Always enjoy having you. This one is a much-awaited decision because we have a number of these cases that are swirling around, challenging the process by which AI companies train their large language models. One of these cases involved the Anthropic AI Claude. Why don’t we jump jump into this one, Tara, maybe you could give us some of the background of this particular case. Tara: Absolutely. In 2021, Anthropic PVC, a startup founded by former OpenAI employees, set out to create a cutting-edge AI system, and that system would eventually become Claude. Like other large language models, Claude was trained on a vast amount of textual data, books, articles, websites, and more. But unlike many of its competitors, Anthropic took a controversial shortcut. Scott: Right. Instead of licensing books or building a clean data set, Anthropic downloaded millions of copyrighted works from pirate sites like Books 3, Library genius, and the pirate, Library Mirror. In total, Anthropic downloaded over seven million pirated books, including works by authors Andrea Barth, Charles Graber, and Kirk Wallace, Johnson. Anthropic also purchased millions of print books, scanned them, and then created a digital central library of searchable files. Tara: So the plaintiff sued, alleging that Anthropic infringed their copyrights by copying their works without permission. First, by downloading them from the pirate sites, and then by using them to train Claude, and finally, by keeping digital copies of the books in its internal library for potential future use. Scott: All right, so let’s now… So as we know, the lawsuit was filed, and Anthropic eventually moved for summary judgment based on fair use only. And in its ruling on Anthropic’s motion, Judge Al up of the Northern District of California issued a very detailed and nuanced opinion. The opinion splits Anthropic’s conduct into three key uses. The first is using the books to train the AI or the large language model, scanning and digitizing legally purchased print books, and thirdly, downloading and keeping pirated books in a permanent digital library. Each of these uses was evaluated under the Copyright Act’s Four-Factor Fair-Use Test. Tara: Right. Let’s walk through how the judge applied the four fair use factors in each use. For anyone who needs a refresher, here are the statutory factors for fair use under Section 107 of the Copyright Act. Scott: If you need a refresher, you’re not listening to this podcast often enough. Go ahead, Tara. Tara: Okay, so we’ll refresh anyway. First is the purpose and character of the use, including whether it is commercial and whether it’s transformative. The second is the nature of the copyrighted work. The third is the amount and substantiality of the portion of the copyrighted work that’s used. And the fourth is the effect the use upon the potential market for the original work, and that’s the economic analysis. Scott: And that one, as we know, has become more persuasive or more focused on since the Supreme Court case, since the Warhol Supreme Court case. All right, let’s focus on the first factor. Let’s focus on the first factor. Or let’s focus on the first use, which was the training of the large language models. So on the first use, the training of the Claude models using books. The court found that to be fair use. So it didn’t matter whether the books were the purchase books or they were the pirated books. The court found that the training on these books to be fair use and focused most heavily on the first factor. The court called this use spectacularly transformative. The court said, The purpose and character of using works to train LLMs was transformative. Spectacularly so. Like any reader aspiring to be a writer, Anthropics LLM trained upon works not to race ahead or replicate or supplant them, but to turn a hard corner and create something different. Tara: Right. So even if the AI memorized a lot of the underlying material, the court stressed that the training did not result in infringing output. Inputs. Users weren’t seeing verbatim excerpts from the plaintiff’s books. Scott: The court rejected the plaintiff’s arguments that just memorizing expressive elements was itself infringement. The court said, If somebody were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? And the court said, Of course it would not. Tara: So the court sided with on the training issue, holding that using books to train Claude was spectacularly transformative. And the judge drew a direct analogy to human learning. The judge said, Everyone reads text, too, then writes new text. They may need to pay for getting their hands on a text in the first case, but to make anyone pay specifically for the use of a book each time they read it, each time they recall it from their memory, each time they later draw upon it when writing new things in new ways, would be unthinkable. Scott: The second and third factors, the nature of the work and the amount used, were considered less significant because of the high degree of transformation. And on the fourth factor, market harm, the judge said there was no evidence of substitution or competitive damage from the training process. So the result was that training was fair use. Tara: Okay, so now turning to the second use that the court analyzed, which is digitizing purchase books. Anthropic also purchased millions of print books and scanned them into searchable PDFs. The plaintiffs argued that changing the format from print to digital was itself infringement. Scott: Yeah, but the court disagreed. Because Anthropic had lawfully purchased these books, destroyed the physical copies and retained one digital copy in its place without redistributing that copy was fair use. The judge wrote, Here, every purchased print copy was copied in order to save storage space and enable a searchability as a digital copy. The print original was destroyed. Once replaced, one replaced the other. And there is no evidence that the new digital copy was shown, shared, or sold outside of the company. Tara: So this use was found to be narrowly transformative, not because of LLM training, but because the digitization made the library more efficient and searchable. And importantly, the court drew a direct line between this and the large scale copying involved in the Napster file sharing case, noting this use was even more clearly transformative than those in Texaco, Google, and Sony Betamax. More transformative than those uses rejected in Napster. So the result, digitizing purchase books is fair use. Scott: So let’s talk about the third use that the court analyzed, which was retaining the pirated copies of books in permanent libraries. And This is where Anthropic lost, fell short of establishing fair use. So Anthropic, as we know, downloaded more than seven million books from pirate sites and kept them in its internal library, even when it had no intention of using many of those books to train its models. The company argued that because some of those books were later used in training, which was fair use, keeping the pirated books was also fair use, was excusable. Tara: The judge also rejected that argument outright. There is no carve out, however, from the Copyright Act for AI Companies, is what the judge said. According to internal emails cited by the court, Anthropic’s founders were aware of the legal risks. The CEO described purchasing books as legal practice, business slog, and expressed a preference for simply downloading pirated copies. In total, the company downloaded books from pirate sources even after it had the option of purchasing or licensing them. Scott: Yeah, that’s the legal practice business log. It just fits that mantra of tech companies move fast, break things. But you better Be sure you’re right, otherwise you’re going to end up on the wrong side of a decision like here, right? And the judge was unambiguous in ruling on this point. Building a central library of works to be available for any number of further uses was itself the use for which Anthropic acquired these pirated copies and not a transformative one. He found that this use, building a centralized permanent library of pirated boo...
NOW PLAYING
Anthropic, Copyright, and the Fair Use Divide
No transcript for this episode yet
Similar Episodes
No similar episodes found.