Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving episode artwork

EPISODE · Mar 1, 2026 · 2H 18M

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis · host Erik Torenberg, Nathan Labenz

Geoffrey Irving, Chief Scientist at the UK AI Security Institute, explains why our theoretical understanding of machine learning remains fragile even as models surpass experts on critical security tasks. He details AISI’s work on frontier model evaluations, red teaming, and threat modeling across biosecurity, cybersecurity, and loss-of-control risks. The conversation explores reward hacking, eval awareness, and why current safety techniques may struggle to deliver high reliability. Listeners will also hear how AISI is funding foundational research to build stronger guarantees for AI safety. Nathan uses Granola to uncover blind spots in conversations and AI research. Try it at ⁠granola.ai/tcr⁠ with code TCR — and if you’re already using it, test his blind spot recipe here: ⁠https://bit.ly/granolablindspot⁠ Sponsors: Serval: Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week 4 at https://serval.com/cognitive Claude: Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai CHAPTERS: (00:00) About the Episode (04:09) From physics to ML (08:52) AGI uncertainty and threats (Part 1) (18:08) Sponsors: Serval | Claude (21:29) AGI uncertainty and threats (Part 2) (27:35) Control, autonomy, alignment (Part 1) (34:02) Sponsor: Tasklet (35:14) Control, autonomy, alignment (Part 2) (38:44) Inside the UK AC (51:02) Evaluations and jailbreaking (01:01:17) Emerging capabilities and misuse (01:14:20) Agents and reward hacking (01:26:09) Theoretical alignment agenda (01:38:39) Debate and formal methods (01:51:19) Limits of formalization (02:02:27) Future risks and governance (02:16:23) Episode Outro (02:18:58) Outro PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Geoffrey Irving, Chief Scientist at the UK AI Security Institute, explains why our theoretical understanding of machine learning remains fragile even as models surpass experts on critical security tasks. He details AISI’s work on frontier model evaluations, red teaming, and threat modeling across biosecurity, cybersecurity, and loss-of-control risks. The conversation explores reward hacking, eval awareness, and why current safety techniques may struggle to deliver high reliability. Listeners will also hear how AISI is funding foundational research to build stronger guarantees for AI safety. Nathan uses Granola to uncover blind spots in conversations and AI research. Try it at ⁠granola.ai/tcr⁠ with code TCR — and if you’re already using it, test his blind spot recipe here: ⁠https://bit.ly/granolablindspot⁠ Sponsors: Serval: Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week 4 at https://serval.com/cognitive Claude: Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai CHAPTERS: (00:00) About the Episode (04:09) From physics to ML (08:52) AGI uncertainty and threats (Part 1) (18:08) Sponsors: Serval | Claude (21:29) AGI uncertainty and threats (Part 2) (27:35) Control, autonomy, alignment (Part 1) (34:02) Sponsor: Tasklet (35:14) Control, autonomy, alignment (Part 2) (38:44) Inside the UK AC (51:02) Evaluations and jailbreaking (01:01:17) Emerging capabilities and misuse (01:14:20) Agents and reward hacking (01:26:09) Theoretical alignment agenda (01:38:39) Debate and formal methods (01:51:19) Limits of formalization (02:02:27) Future risks and governance (02:16:23) Episode Outro (02:18:58) Outro PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

NOW PLAYING

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

0:00 2:18:32

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis?

This episode is 2 hours and 18 minutes long.

When was this "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis episode published?

This episode was published on March 1, 2026.

What is this episode about?

Geoffrey Irving, Chief Scientist at the UK AI Security Institute, explains why our theoretical understanding of machine learning remains fragile even as models surpass experts on critical security tasks. He details AISI’s work on frontier model...

Can I download this "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!