EPISODE · May 29, 2026 · 19 MIN
“Retrying vs Resampling in AI Control” by james.lucassen, Adam Kaufman
We’ve just released a new paper: Retrying vs Resampling in AI Control. We revisit the resampling protocols introduced in Ctrl-Z with an up-to-date setting and much stronger models, and compare them against “retrying” protocols similar to Claude Code auto mode or Codex Auto-review. Motivation Roughly a year ago we released Ctrl-Z, the first paper to study control techniques for agents. A headline result of that paper was the performance of resample protocols – strategies that involve taking multiple i.i.d. samples from the model per step. But since Ctrl-Z, models have gotten much stronger, and we have built more sophisticated control settings to keep up. We wanted to answer the following questions: How well do the results from Ctrl-Z hold up with better models and a better setting? Current high stakes control research is trying to learn by analogy about how to do control effectively in a real high stakes deployment during a real intelligence explosion. Findings about technique performance[1] are going to have to generalize pretty far to be useful. If the resample protocols from Ctrl-Z still work, what makes them work? One way we try to make our work more generalizable is by understanding the dynamics governing outcomes [...] ---Outline:(00:30) Motivation(02:36) TL;DR Takeaways(03:55) en-US-AvaMultilingualNeural__ Bar graph showing safety percentages across different monitoring and resampling methods.(04:36) Methodology(07:16) Differences from Ctrl-Z(12:11) Are Retrying Protocols Exploitable?(15:24) Cost and Latency of Resampling(17:22) Conclusion The original text contained 7 footnotes which were omitted from this narration. --- First published: May 29th, 2026 Source: https://www.lesswrong.com/posts/yThJQTJxtmKNeZGwA/retrying-vs-resampling-in-ai-control --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Retrying vs Resampling in AI Control” by james.lucassen, Adam Kaufman
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m