Anthropic, Copyright, and the Fair Use Divide

from The Briefing by Weintraub Tobin · host Weintraub Tobin

A federal judge has ruled that training Claude AI on copyrighted books—even without a license—was transformative and protected under fair use. But storing millions of pirated books in a permanent internal library? That crossed the line. In this episode of The Briefing, Scott Hervey and Tara Sattler break down this nuanced opinion and what this ruling means for AI developers and copyright owners going forward. Watch this episode on YouTube. Show Notes: Scott: What happens when an artificial intelligence company trains its models on millions of books? Some purchased, some pirated. In a closely watched ruling, a federal judge held that training the AI was fair use, likening the process to how a human learns by reading. But keeping pirated copies of those books in a permanent digital library, well, that crossed the line. I’m Scott Hervey, a partner with the law firm of Weintraub Tobin, and I’m joined today by my partner and frequent Briefing contributor, Tara Sattler. We are going to break down the recent fair use ruling in the lawsuit over Claude AI, that’s Anthropic’s AI, and explore what it means for the future of AI training on today’s installment of the briefing. Tara, welcome back to The Briefing. Good to have you. Tara: Thanks, Scott. I always enjoy being here with you. Scott: Always enjoy having you. This one is a much-awaited decision because we have a number of these cases that are swirling around, challenging the process by which AI companies train their large language models. One of these cases involved the Anthropic AI Claude. Why don’t we jump jump into this one, Tara, maybe you could give us some of the background of this particular case. Tara: Absolutely. In 2021, Anthropic PVC, a startup founded by former OpenAI employees, set out to create a cutting-edge AI system, and that system would eventually become Claude. Like other large language models, Claude was trained on a vast amount of textual data, books, articles, websites, and more. But unlike many of its competitors, Anthropic took a controversial shortcut. Scott: Right. Instead of licensing books or building a clean data set, Anthropic downloaded millions of copyrighted works from pirate sites like Books 3, Library genius, and the pirate, Library Mirror. In total, Anthropic downloaded over seven million pirated books, including works by authors Andrea Barth, Charles Graber, and Kirk Wallace, Johnson. Anthropic also purchased millions of print books, scanned them, and then created a digital central library of searchable files. Tara: So the plaintiff sued, alleging that Anthropic infringed their copyrights by copying their works without permission. First, by downloading them from the pirate sites, and then by using them to train Claude, and finally, by keeping digital copies of the books in its internal library for potential future use. Scott: All right, so let’s now… So as we know, the lawsuit was filed, and Anthropic eventually moved for summary judgment based on fair use only. And in its ruling on Anthropic’s motion, Judge Al up of the Northern District of California issued a very detailed and nuanced opinion. The opinion splits Anthropic’s conduct into three key uses. The first is using the books to train the AI or the large language model, scanning and digitizing legally purchased print books, and thirdly, downloading and keeping pirated books in a permanent digital library. Each of these uses was evaluated under the Copyright Act’s Four-Factor Fair-Use Test. Tara: Right. Let’s walk through how the judge applied the four fair use factors in each use. For anyone who needs a refresher, here are the statutory factors for fair use under Section 107 of the Copyright Act. Scott: If you need a refresher, you’re not listening to this podcast often enough. Go ahead, Tara. Tara: Okay, so we’ll refresh anyway. First is the purpose and character of the use, including whether it is commercial and whether it’s transformative. The second is the nature of the copyrighted work. The third is the amount and substantiality of the portion of the copyrighted work that’s used. And the fourth is the effect the use upon the potential market for the original work, and that’s the economic analysis. Scott: And that one, as we know, has become more persuasive or more focused on since the Supreme Court case, since the Warhol Supreme Court case. All right, let’s focus on the first factor. Let’s focus on the first factor. Or let’s focus on the first use, which was the training of the large language models. So on the first use, the training of the Claude models using books. The court found that to be fair use. So it didn’t matter whether the books were the purchase books or they were the pirated books. The court found that the training on these books to be fair use and focused most heavily on the first factor. The court called this use spectacularly transformative. The court said, The purpose and character of using works to train LLMs was transformative. Spectacularly so. Like any reader aspiring to be a writer, Anthropics LLM trained upon works not to race ahead or replicate or supplant them, but to turn a hard corner and create something different. Tara: Right. So even if the AI memorized a lot of the underlying material, the court stressed that the training did not result in infringing output. Inputs. Users weren’t seeing verbatim excerpts from the plaintiff’s books. Scott: The court rejected the plaintiff’s arguments that just memorizing expressive elements was itself infringement. The court said, If somebody were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? And the court said, Of course it would not. Tara: So the court sided with on the training issue, holding that using books to train Claude was spectacularly transformative. And the judge drew a direct analogy to human learning. The judge said, Everyone reads text, too, then writes new text. They may need to pay for getting their hands on a text in the first case, but to make anyone pay specifically for the use of a book each time they read it, each time they recall it from their memory, each time they later draw upon it when writing new things in new ways, would be unthinkable. Scott: The second and third factors, the nature of the work and the amount used, were considered less significant because of the high degree of transformation. And on the fourth factor, market harm, the judge said there was no evidence of substitution or competitive damage from the training process. So the result was that training was fair use. Tara: Okay, so now turning to the second use that the court analyzed, which is digitizing purchase books. Anthropic also purchased millions of print books and scanned them into searchable PDFs. The plaintiffs argued that changing the format from print to digital was itself infringement. Scott: Yeah, but the court disagreed. Because Anthropic had lawfully purchased these books, destroyed the physical copies and retained one digital copy in its place without redistributing that copy was fair use. The judge wrote, Here, every purchased print copy was copied in order to save storage space and enable a searchability as a digital copy. The print original was destroyed. Once replaced, one replaced the other. And there is no evidence that the new digital copy was shown, shared, or sold outside of the company. Tara: So this use was found to be narrowly transformative, not because of LLM training, but because the digitization made the library more efficient and searchable. And importantly, the court drew a direct line between this and the large scale copying involved in the Napster file sharing case, noting this use was even more clearly transformative than those in Texaco, Google, and Sony Betamax. More transformative than those uses rejected in Napster. So the result, digitizing purchase books is fair use. Scott: So let’s talk about the third use that the court analyzed, which was retaining the pirated copies of books in permanent libraries. And This is where Anthropic lost, fell short of establishing fair use. So Anthropic, as we know, downloaded more than seven million books from pirate sites and kept them in its internal library, even when it had no intention of using many of those books to train its models. The company argued that because some of those books were later used in training, which was fair use, keeping the pirated books was also fair use, was excusable. Tara: The judge also rejected that argument outright. There is no carve out, however, from the Copyright Act for AI Companies, is what the judge said. According to internal emails cited by the court, Anthropic’s founders were aware of the legal risks. The CEO described purchasing books as legal practice, business slog, and expressed a preference for simply downloading pirated copies. In total, the company downloaded books from pirate sources even after it had the option of purchasing or licensing them. Scott: Yeah, that’s the legal practice business log. It just fits that mantra of tech companies move fast, break things. But you better Be sure you’re right, otherwise you’re going to end up on the wrong side of a decision like here, right? And the judge was unambiguous in ruling on this point. Building a central library of works to be available for any number of further uses was itself the use for which Anthropic acquired these pirated copies and not a transformative one. He found that this use, building a centralized permanent library of pirated boo...

NOW PLAYING

0:00 16:12

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

No similar episodes found.

Similar Podcasts

MG Show MG Show The MG Show, hosted by Jeffrey Pedersen and Shannon Townsend, is a leading alternative media platform dedicated to uncovering the truth behind today’s most pressing political issues. Launched in 2019, the show has grown exponentially, offering unfiltered insights, comprehensive research, and real-time analysis. With a commitment to independent journalism and factual integrity, the MG Show empowers its audience with knowledge and encourages active participation in the political discourse. The Game Radio Popolare Soldi, lavoro, avidità, disoccupazioni: il grande gioco dell’economia smontato ogni giorno da Raffaele Liguori. Photo Breakdown Scott Wyden Kivowitz Photo Breakdown is a podcast in which we explore the world of photography with a trusted guide, host Scott Wyden Kivowitz. His expertise and passion bring the industry to life as we explore the stories, trends, and ideas shaping it today. Join us as we dissect everything from incredible photographs and creative techniques to the latest gear releases and hot topics in the photography community.In each episode, we break down what’s happening behind the scenes - whether it’s making a powerful image, a candid discussion on industry trends, or a reflection on the tools and technology changing how we make photographs. You’ll get insights, expert opinions, and a fresh perspective on what’s top of mind for photographers right now.Anticipate short, engaging episodes brimming with ideas and inspiration. Be part of the conversation by sharing your thoughts, voice notes, and comments. Your participation is what makes our community vibrant and dynamic.It’s more than just photography - everyth The Last Outlaws Impact Studios at UTS In a History Lab season like no other, we're pulling on the threads of one of Australia's great misunderstood histories, moving beyond the myths to learn what the Aboriginal brothers Jimmy and Joe Governor faced in both life and death.Australia's budding Federation is the background setting to this remarkable story, that sees the Governor brothers tied to the inauguration of a 'new' nation and Australia's dark history of frontier violence, racial injustice and the global trade and defilement of Aboriginal ancestral remains. This Impact Studios production is a collaboration with the Governor family, UTS Faculty of Law and Jumbunna Institute for Indigenous Education and Research.The Last Outlaws teamKatherine Biber - UTS Law Professor and Chief InvestigatorAunty Loretta Parsley - Great-granddaughter of Jimmy Governor and the Governor Family Historian Leroy Parsons - Governor descendant, Narrator and Co-WriterKaitlyn Sawrey - Host, Writer and Senior ProducerFrank Lopez - Writer,

URL copied to clipboard!

Share this episode

Similar Episodes

Similar Podcasts

Age Verification