"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving episode artwork

EPISODE · May 17, 2026 · 7 MIN

"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving

from LessWrong (Curated & Popular)

Summary This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here. AI research agents may help solve ASI alignment, for example via the following plan: Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more of the research processHand over primary research responsibility once agents outperform humans at all relevant alignment tasks. We argue that automating alignment research in this manner could produce catastrophically misleading safety assessments, causing researchers to believe that an egregiously misaligned AI is safe, even if AI agents are not scheming to deliberately sabotage alignment research. Our core argument (Fig. 1) is as follows: The goal of an automated alignment program is to produce an overall safety assessment (OSA) - an estimate of the probability that the next-generation agent is non-scheming - that is both calibrated and shows low risk.[2]Producing an OSA involves several tasks that are difficult to check. We refer to these as hard-to-supervise fuzzy tasks: tasks [...] ---Outline:(00:13) Summary(07:10) Acknowledgments The original text contained 4 footnotes which were omitted from this narration. --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gpuYFbMNH8PJXpmny/automated-alignment-is-harder-than-you-think-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Summary This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here. AI research agents may help solve ASI alignment, for example via the following plan: Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually aut...

NOW PLAYING

"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving

0:00 7:51

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

MuppetWatch petervulfranc MuppetWatch is a journey through entertainment and popular culture using The Muppet Show as our guide. Tips, News and Stories for Older Adults Esther C Kane CAPS, C.D.S. "Tips, News, and Stories for Older Adults" delivers weekly insights tailored for seniors. We bring you summaries of curated news, practical advice, and inspiring stories that matter to the 55+ community. From health and finance to technology and lifestyle, our content keeps you informed and engaged. Sourced from trusted outlets, each episode offers valuable information for navigating your golden years. Join us as we explore aging with positivity, wisdom, and engaging stories. Your perfect companion for staying active, learning, and embracing life's later chapters. Digital Marketing Legend Leaks Srinidhi Ranganathan "Digital Marketing Legend Leaks" is the most popular Artificial Intelligence (AI) powered Digital Marketing Podcast on Spreaker Podcast Platform hosted by "Digital Marketing Legend" Srinidhi Ranganathan, the Human AI who is the CEO of Bookspotz.YouTube Channel - https://www.youtube.com/channel/UCXP3bY7BbMt1pXK0tPp8G4QAlso, visit https://www.bookspotz.com/ to read mind-blowing articles on AI Digital Marketing, Mind-Cloning, Immortality The Dennis Michael Lynch Show TeamDML M-F 10amET: Dennis Michael Lynch ("DML") is a popular conservative commentator who covers news, politics, culture, and society. His unfiltered approach in delivering hard-hitting commentary stems from DML's unique perspective as an award-winning entrepreneur, acclaimed filmmaker, respected cable news anchor, and dedicated family man. The show offers a collection of styles, including interviews, solo commentary, long form and short form, and is presented in both video and audio. VIDEO: The podcast airs Monday-Friday, LIVE on Facebook.com/DMLNewsApp, TeamDML.com. AUDIO: Available on The DML NEWS APP, TeamDML.com, and on all major podcast platforms. Tune in to listen to the most trusted man in news. Download the DML NEWS APP and never miss an episode.

Frequently Asked Questions

How long is this episode of LessWrong (Curated & Popular)?

This episode is 7 minutes long.

When was this LessWrong (Curated & Popular) episode published?

This episode was published on May 17, 2026.

What is this episode about?

Summary This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here. AI research agents may help solve ASI alignment, for example via the following plan: Build agents that can do empirical alignment work...

Can I download this LessWrong (Curated & Popular) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!