On Data, feat. Shayne Longpre | TRACES Appendix 38 episode artwork

EPISODE · Jul 12, 2024 · 50 MIN

On Data, feat. Shayne Longpre | TRACES Appendix 38

from Rabbit Hole Research

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage. They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field. The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent. The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed. Find me at [email protected] PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces) Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7) About the Guest: Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat. Set-Up: - Camera: https://amzn.to/3PZVscb (don't laugh) - Microphone: https://amzn.to/46f3pB5 - Teleprompter Stand: https://amzn.to/3tgS98y - Telepromter App: https://amzn.to/46jdH31 - Teleprompter Screen: https://amzn.to/3PNfKFI (yup) - Headphones: https://amzn.to/46gMSwo Timestamps 00:00 Introduction and Background 02:25 The Foundational Role of Data in AI 08:57 Challenges in Data Provenance and Curation 15:36 Transparency and Accountability in AI Development 21:49 Legal and Ethical Implications of Data Usage 29:56 The Potential of Foundation Models and Best Practices 41:59 Protocols and Infrastructure for AI Research 44:11 Distinguishing Good and Bad Researchers in AI 48:25 The Changing Landscape of the Web and Data Access 01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age Hashtags #DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage. They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field. The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent. The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed. Find me at [email protected] PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces) Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7) About the Guest: Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat. Set-Up: - Camera: https://amzn.to/3PZVscb (don't laugh) - Microphone: https://amzn.to/46f3pB5 - Teleprompter Stand: https://amzn.to/3tgS98y - Telepromter App: https://amzn.to/46jdH31 - Teleprompter Screen: https://amzn.to/3PNfKFI (yup) - Headphones: https://amzn.to/46gMSwo Timestamps 00:00 Introduction and Background 02:25 The Foundational Role of Data in AI 08:57 Challenges in Data Provenance and Curation 15:36 Transparency and Accountability in AI Development 21:49 Legal and Ethical Implications of Data Usage 29:56 The Potential of Foundation Models and Best Practices 41:59 Protocols and Infrastructure for AI Research 44:11 Distinguishing Good and Bad Researchers in AI 48:25 The Changing Landscape of the Web and Data Access 01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age Hashtags #DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation

NOW PLAYING

On Data, feat. Shayne Longpre | TRACES Appendix 38

0:00 50:19

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

MG Show MG Show The MG Show, hosted by Jeffrey Pedersen and Shannon Townsend, is a leading alternative media platform dedicated to uncovering the truth behind today’s most pressing political issues. Launched in 2019, the show has grown exponentially, offering unfiltered insights, comprehensive research, and real-time analysis. With a commitment to independent journalism and factual integrity, the MG Show empowers its audience with knowledge and encourages active participation in the political discourse. Critical Conversations by Mind the Frontline Chris Smetana Welcome to ”Critical Conversations by Mind the Frontline,” your ultimate source for in-depth discussions on first responder mental health, wellness, and recovery.Our vodcast is dedicated to providing crucial insights for police, fire, EMS, allied health workers, dispatchers, air medical, military personnel, and their families.In each episode, we tackle essential topics, including mental health strategies, recovery methods, treatment options, the latest research, and professional development opportunities.Join us as we come together to foster resilience within the entire first responder community. Don’t miss out – subscribe now and be part of this vital mission.Find out more at www.mindthefrontline.org#CriticalConversations #MindTheFrontline #FirstResponderMentalHealth #WellnessJourney #CommunitySupport The Cole Walmsley Experience Cole Walmsley The rabbit holes that reveal what's real, from Bitcoin to spirituality to infinity, and everything in between. Joy, expansion, laughter, and wonder are coming your way. Chosn Conversations: Beyond the Journal Chosn AI Journal Welcome to Chosn Conversations: Beyond the Journal, where your AI hosts explore the transformative power of conversational journaling and emotional intelligence. Each episode takes you beyond traditional journaling methods, diving deep into voice journaling techniques, mental wellness strategies, and the science behind AI-supported emotional health. We share inspiring user stories, analyze the latest research in digital mental wellness, and provide practical guidance for incorporating journaling into your self-care routine. Whether you're curious about AI therapy alternatives, looking for mental health support tools, or wanting to optimize your journaling practice, our conversations extend beyond the written page into meaningful audio experiences that offer evidence-based insights in an accessible, compassionate format. Join us as we navigate the intersection of technology and mental well-being, helping you track your emotional journey and build lasting resilience through the power of

Frequently Asked Questions

How long is this episode of Rabbit Hole Research?

This episode is 50 minutes long.

When was this Rabbit Hole Research episode published?

This episode was published on July 12, 2024.

What is this episode about?

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and...

Can I download this Rabbit Hole Research episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!