Should AI Read Without Permission? episode artwork

EPISODE · Sep 22, 2025 · 55 MIN

Should AI Read Without Permission?

from Justified Posteriors · host Andrey Fradkin and Seth Benzell

Many of today’s thinkers and journalists worry that AI models are eating their lunch: hoovering up these authors’ best ideas and giving them away for free or nearly free. Beyond fairness, there is a worry that these authors will stop producing valuable content if they can’t be compensated for their work. On the other hand, making lots of data freely accessible makes AI models better, potentially increasing the utility of everyone using them. Lawsuits are working their way through the courts as we speak of AI with property rights. Society needs a better of understanding the harms and benefits of different AI property rights regimes.A useful first question is “How much is the AI actually remembering about specific books it is illicitly reading?” To find out, co-hosts Seth and Andrey read “Cloze Encounters: The Impact of Pirated Data Access on LLM Performance”. The paper cleverly measures this through how often the AI can recall proper names from the dubiously legal “Book3” darkweb data repository — although Andrey raises some experimental concerns. Listen in to hear more about what our AI models are learning from naughty books, and how Seth and Andrey think that should inform AI property rights moving forward. Also mentioned in the podcast are: * Joshua Gans paper on AI property rights “Copyright Policy Options for Generative Artificial Intelligence” accepted at the Journal of Law and Economics: * Fair Use* The Anthropic lawsuit discussed in the podcast about illegal use of books has reached a tentative settlement after the podcast was recorded. The headline summary: “Anthropic, the developer of the Claude AI system, has agreed to a proposed $1.5 billion settlement to resolve a class-action lawsuit, in which authors and publishers alleged that Anthropic used pirated copies of books — sourced from online repositories such as Books3, LibGen, and Pirate Library Mirror — to train its Large Language Models (LLMs). Approximately 500,000 works are covered, with compensation set at approximately $3,000 per book. As part of the settlement, Anthropic has also agreed to destroy the unlawfully obtained files.”* Our previous Scaling Law episode: This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com

NOW PLAYING

Should AI Read Without Permission?

0:00 55:06

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of Justified Posteriors?

This episode is 55 minutes long.

When was this Justified Posteriors episode published?

This episode was published on September 22, 2025.

What is this episode about?

Many of today’s thinkers and journalists worry that AI models are eating their lunch: hoovering up these authors’ best ideas and giving them away for free or nearly free. Beyond fairness, there is a worry that these authors will stop producing...

Can I download this Justified Posteriors episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!