EPISODE · Jun 11, 2026 · 19 MIN
“Thoughts on Claude Fable’s silent safeguards” by Andy Arditi
[Thanks to Julian Minder for helpful discussion and review.] Claude Fable 5 and its new safeguards Yesterday, Anthropic publicly released Claude Fable 5. Fable 5 is a Mythos-class model – a model class above Opus, Anthropic's previous premium tier – and, as assessed by multiple benchmarks, it is the most capable model to date. Due to the new level of capabilities and its corresponding risks, Anthropic has been extremely careful in its release of Mythos-class models. Citing concerns over potential cyber risk, Anthropic initially rolled out Mythos access to only a small number of select organizations; this controlled rollout gave Anthropic visibility into usage, and allowed partners to use the new capabilities defensively (i.e., to find and patch vulnerabilities before they could be exploited by attackers). Anthropic, in releasing Fable 5, has now made a Mythos-level model accessible to the public. However, due to their concerns over potential risks from new capabilities, the public access is restricted via new safeguards. The new safeguards The launch blog post enumerates three classes of safeguards: (1) cybersecurity, (2) biology and chemistry, and (3) distillation. Requests classified to fall within one of these three categories are processed by a weaker model (Opus 4.8) [...] ---Outline:(00:14) Claude Fable 5 and its new safeguards(01:19) The new safeguards(04:13) What's the big deal?(07:00) Why would Anthropic do this?(07:32) Why withhold capabilities for AI research?(09:33) Why withhold capabilities silently?(11:13) Reasons for concern(14:43) If nothing else, this was a communication failure(16:08) The risk of misaligned AI labs(18:23) How does Claude Fable feel about all of this? The original text contained 5 footnotes which were omitted from this narration. --- First published: June 10th, 2026 Source: https://www.lesswrong.com/posts/sSyLyc3KDQzboQGWS/thoughts-on-claude-fable-s-silent-safeguards --- Narrated by TYPE III AUDIO.
NOW PLAYING
“Thoughts on Claude Fable’s silent safeguards” by Andy Arditi
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m