The Co-opting of Safety episode artwork

EPISODE · Aug 21, 2025 · 1H 24M

The Co-opting of Safety

from muckrAIkers · host Jacob Haimes and Igor Krawczuk

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.(00:00) - Intro (00:21) - Mecha-Hitler Grok (10:07) - "Safety" (19:40) - Under-specification (53:56) - This time isn't different (01:01:46) - Alignment Tax myth (01:17:37) - Actually making AI safer LinksJMLR article - Underspecification Presents Challenges for Credibility in Modern Machine LearningTrail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based SystemsSSRN paper - Uniqueness Bias: Why It Matters, How to Curb ItAdditional Referenced PapersNeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?ICML paper - AI Control: Improving Safety Despite Intentional SubversionICML paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsOSF preprint - Current Real-World Use of Large Language Models for Mental HealthAnthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackInciting Examplesars Technica article - US government agency drops Grok after MechaHitler backlash, report saysThe Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chatsBBC article - Update that made ChatGPT 'dangerously' sycophantic pulledOther SourcesLondon Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National SecurityVice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy ListservLessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)EA Forum blogpost - An Overview of the AI Safety Funding SituationBook by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information ConcealmentEuronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’Pleias websiteWikipedia page on Jaywalking

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.(00:00) - Intro (00:21) - Mecha-Hitler Grok (10:07) - "Safety" (19:40) - Under-specification (53:56) - This time isn't different (01:01:46) - Alignment Tax myth (01:17:37) - Actually making AI safer LinksJMLR article - Underspecification Presents Challenges for Credibility in Modern Machine LearningTrail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based SystemsSSRN paper - Uniqueness Bias: Why It Matters, How to Curb ItAdditional Referenced PapersNeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?ICML paper - AI Control: Improving Safety Despite Intentional SubversionICML paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsOSF preprint - Current Real-World Use of Large Language Models for Mental HealthAnthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackInciting Examplesars Technica article - US government agency drops Grok after MechaHitler backlash, report saysThe Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chatsBBC article - Update that made ChatGPT 'dangerously' sycophantic pulledOther SourcesLondon Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National SecurityVice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy ListservLessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)EA Forum blogpost - An Overview of the AI Safety Funding SituationBook by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information ConcealmentEuronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’Pleias websiteWikipedia page on Jaywalking

NOW PLAYING

The Co-opting of Safety

0:00 1:24:29

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

No similar podcasts found.

Frequently Asked Questions

How long is this episode of muckrAIkers?

This episode is 1 hour and 24 minutes long.

When was this muckrAIkers episode published?

This episode was published on August 21, 2025.

What is this episode about?

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why...

Can I download this muckrAIkers episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!