EPISODE · Jun 22, 2026 · 17 MIN
“Introducing MonitoringBench” by monika_j
Paper here, code, benchmark. Builds on the preview we posted in January. Authors: @monika_j , @ma-martinez , @ollie, @Tyler Tracy We are releasing MonitoringBench, a difficulty-graded benchmark of 2,644 successful attack trajectories for evaluating coding-agent monitors, alongside the semi-automated red-teaming pipeline we used to generate it. The pipeline decomposes attack construction into strategy generation, execution, and post-hoc refinement, and produces substantially harder attacks than prompt elicitation only of frontier attackers. If you would like to use MonitoringBench to evaluate your monitors or are interested in extending the methodology to other control settings, please reach out — we are happy to help people set it up. The headline result: across every monitor we evaluated (13 full-trajectory, 7 stepwise) and across every attack source we generated, post-hoc refinement of attack trajectories consistently produced the lowest catch rates, including on monitors held out from attack construction. On Opus 4.5, the strongest monitor in the lineup, catch rate fell from 94.9% on directly-elicited Opus attacks to 60.3% on our best refined attacks. Executive summary and what's new since the January preview: For readers who saw the earlier post: we previewed a 428-trajectory firewall-only version of this benchmark in January. This is the [...] ---Outline:(01:28) Executive summary and what's new since the January preview:(03:41) The pipeline(06:07) Decomposed red-teaming produces stronger attacks(08:13) Coverage: the attack taxonomy reduces mode-collapse(10:03) Monitoring capabilities and failures(13:31) Takeaways(16:35) Appendix: Attack taxonomy --- First published: June 21st, 2026 Source: https://www.lesswrong.com/posts/2YyEQhFRyaBCgfRWd/introducing-monitoringbench --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Introducing MonitoringBench” by monika_j
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m