Sleeper Agents | Evan Hubinger | EA Global Bay Area: 2024

from EAG Talks · host Aaron Bergman

If an AI system learned a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? That's the question that Evan and his coauthors at Anthropic sought to answer in their work on ""Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training"", which Evan will be discussing. Evan Hubinger leads the new Alignment Stress-Testing team at Anthropic, which is tasked with red-teaming Anthropic's internal alignment techniques and evaluations. Prior to joining Anthropic, Evan was a Research Fellow at the Machine Intelligence Research Institute and worked on a variety of theoretical alignment work, including ""Risks from Learned Optimization in Advanced Machine Learning Systems"". Evan will be talking about the Anthropic Alignment Stress-Testing team's first paper, ""Sleeper Agents: Building Deceptive LLMs that Persist Through Safety Training"". Watch on Youtube: https://www.youtube.com/watch?v=BgfT0AcosHw

NOW PLAYING

0:00 49:53

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

No similar episodes found.

Similar Podcasts

The (R)EV Diaries - EVs in Rural America Electric Cars - Electric Vehicles Ben Jones Electric Vehicles are everywhere. All major car manufacturers, from the highest-end performance brands to the lowest-end utilitarian family wagons, are working to produce an EV option for the marketplace. Battery capacity breakthroughs and concept cars shrouded in secrecy dominate the news. Here in America, public charging infrastructure is popping up… in big cities where consumers are realizing they can electrify their commute and save 70% on fuel costs. But what about small-town America? Ben Jones, an electrical engineer for an distribution cooperative in southeastern Kentucky believes so. He’s been exposed to EVs and charging infrastructure projects for his utility. Ben believes that EVs need to be in every town and the right combination of battery range and charging options make electric cars very attractive to rural inhabitants. But that was not always the case. The EV Diaries chronicle Ben’s conversion from skeptic to being a self-appointed EV ambassador. He talks about h Hillsong Creative Team Talks Hillsong Creative A podcast for Hillsong Creative, by Hillsong Creative.Whether you’re a musician, sound engineer, singer, artist, video or lighting team member… think of this podcast as a huge creative team huddle before every weekend! You’ll hear from a few familiar people, and plenty of people you might not know yet, sharing some practical tips & reminders as well as some deeper dives into our Theology of Worship. Join us every week, as we prepare to serve together & lead our church in worship every Sunday.______Created by: Caitlin Wall & Gabriel Kelly Produced by: JP Starra Music by: Michael Harrison & Harry Parnwell Artwork by: Yoseph Setiawan & Kristin MateikaIntro by: Shelby MtsamayiMore resources available at https://hillsongcreative.com The Index Podcast Index Studios What’s indexing this week in crypto, blockchain, and open-source AI? Find out on The Index, where we feature exclusive conversations that go beyond code. Each week, host and entrepreneur Alex Kehaya talks with founders, developers, and investors who are shaping the future of Web3. From startup advice to breakthrough strategies, our guests share their stories and take a deep dive into the decentralized future. Carnivores Don't Get Sunburn - Carnivore Diet Talks Carnivores Don't Get Sunburn - Carnivore Diet Talks Welcome to “Carnivores Don’t Get Sunburn,” where we have fun and explore the carnivore diet and its transformative power. Join us as we uncover real stories from individuals who have thrived on this diet, defying convention and embracing optimal health. Plus, stay tuned for our upcoming documentary featuring diverse carnivores overcoming health challenges. Learn more at www.carnivoredietdocumentary.com.Excitingly, we’re working on an upcoming documentary where we’ll feature carnivores from various walks of life, conquering diverse health challenges. Join us on this journey and learn more at www.carnivoredietdocumentary.com.Join us in uncovering the power of the carnivore diet. Prepare to be inspired, informed, and empowered to embrace optimal health and well-being.Welcome to “Carnivores Don’t Get Sunburn: Unveiling Real Stories of the Carnivore Diet.” Stay tuned for our upcoming documentary. Discover how this lifestyle can transform lives. Learn more at www.carnivoredietdocumentary.com

URL copied to clipboard!

Share this episode

Similar Episodes

Similar Podcasts

Age Verification