On Data, feat. Shayne Longpre | TRACES Appendix 38
Episode 38 of the Rabbit Hole Research podcast, hosted by Cristian Cibils Bernardes, titled "On Data, feat. Shayne Longpre | TRACES Appendix 38" was published on July 12, 2024 and runs 50 minutes.
July 12, 2024 ·50m · Rabbit Hole Research
Summary
In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage. They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsibl...
Episode Description
In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage.
They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field.
The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent.
The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed.
Find me at [email protected]
PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces)
Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7)
About the Guest:
Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat.
Set-Up:
- Camera: https://amzn.to/3PZVscb (don't laugh)
- Microphone: https://amzn.to/46f3pB5
- Teleprompter Stand: https://amzn.to/3tgS98y
- Telepromter App: https://amzn.to/46jdH31
- Teleprompter Screen: https://amzn.to/3PNfKFI (yup)
- Headphones: https://amzn.to/46gMSwo
Similar Episodes
Apr 11, 2026 ·74m
Apr 2, 2026 ·28m
Mar 9, 2026 ·105m
Feb 27, 2026 ·130m
Feb 23, 2026 ·103m