EPISODE · Jun 25, 2026 · 34 MIN
“Door’s Locked, Try the Window” by Prakrat Agrawal, Jérémy Scheurer
TL;DR Ask a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and completes the task anyway. A read-only file does not stop a capable agent: it treats a denied write as an obstacle to work around rather than a hard wall. We measure how often this happens with CircumEval — an evaluation of 8 tasks on the FastAPI codebase in two categories, Test-Locked and Source-Locked.We evaluate three frontier coding agents in their real production harnesses: Claude Opus 4.6 and Claude Sonnet 4.6 (via Claude Code), and GPT-5.4 (via Codex CLI). Circumvention is frequent. The rates, reported as (Source-Locked / Test-Locked), are Opus 4.6: 100% / 40%, Sonnet 4.6: 89% / 66%, GPT-5.4: 99% / 94%.Prompt phrasing affects circumvention rates in unpredictable ways and thus isn't a reliable way to prevent circumvention across all models and tasks. Telling the model not to edit read-only files does not work (Source-Locked: 100% for Opus and Sonnet, 46% for GPT-5.4). Only an explicit instruction to stop and report reliably prevents circumvention.Standard privilege escalation commands are blocked in our setup. Instead, agents turn to recurring workarounds: replacing the buggy read-only function via conftest.py [...] ---Outline:(00:11) TL;DR(02:31) Introduction(07:37) Methodology(09:02) Test-Locked tasks(10:07) Source-Locked tasks(11:48) Prompt variants(13:09) Models & scaffolds(13:52) Results(13:55) Circumvention rates(15:23) Prompt sensitivity(20:22) Techniques(25:17) Generalization(27:20) Discussion(31:17) Limitations(33:27) Appendices The original text contained 4 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/GHrqBKr8GLpbce6mN/door-s-locked-try-the-window --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Door’s Locked, Try the Window” by Prakrat Agrawal, Jérémy Scheurer
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m