How many episodes does AI Explained Official Podcast have?

AI Explained Official Podcast currently has 50 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is AI Explained Official Podcast about?

Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.

How often does AI Explained Official Podcast release new episodes?

AI Explained Official Podcast has 50 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to AI Explained Official Podcast?

You can listen to AI Explained Official Podcast on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts AI Explained Official Podcast?

AI Explained Official Podcast is created and hosted by Philip - Host of AI Explained YT.

AI Explained Official Podcast Podcast - All Episodes

56

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies

GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines.Chapters:01:11 - GPT 5.5 Comparison06:04 - Mythos Marketing11:50 - Recursive Self-Improvement?14:11 - Deepseek V418:03 - VibeCode Experiment Extravaganza21:44 - The Scarce Compute Erahttps://80000hours.org/aiexplainedOpenAI Benchmarks: https://openai.com/index/introducing-gpt-5-5/5.5 System Card: https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdfDirect Comparison: https://pbs.twimg.com/media/HGnNm5GWEAAJ1Ob?format=jpg&name=4096x4096DeepSeek Paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-ProSWE Bench Pro - benchmark of choice? https://x.com/ChowdhuryNeil/status/2047416077622395025AA Omniscience: https://artificialanalysis.ai/evaluations/omniscienceVending Bench: https://x.com/andonlabs/status/2047377260412649967Opus 4.7 System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdfSam Altman Drunk Phase: https://x.com/sama/with_repliesNoam Brown: https://x.com/polynoamial/status/2047387675762802998DeepSeek Compute Crunch: https://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough?srnd=phx-aiSpreadsheet Bench: https://x.com/nicochristie/status/2047476237464211721Pattern Recognition: https://arcprize.org/leaderboardLeader Interviews: Core Memory: https://www.youtube.com/watch?v=NCKQL0op30EKnowledge Podcast: https://www.youtube.com/watch?v=6JoUcQ1qmAcBig Tech Round 1: https://www.youtube.com/watch?v=J6vYvk7R190&t=1116sBig Tech Round 2: https://www.youtube.com/watch?v=YnoQ8RJbALw&t=8sClaude Code Limitations: https://x.com/TheAmolAvasare/status/2046724659039932830ChatGPT 5.4 for Clinicians: https://openai.com/index/making-chatgpt-better-for-clinicians/Image Arena: https://x.com/arena/status/2046670703311884548VibeCode Bench: https://www.vals.ai/benchmarks/vibe-code5.5-made Game +Seedance 2.0: https://rosemere-quest.pages.dev/

Apr 24, 2026

25m

55

Claude Opus 4.7 - A New Frontier, in Performance … and Drama

Claude Opus 4.7 just dropped, but behind every headline lies a deeper story. From a bonanza of benchmarks, to seeing the fruits of one of the biggest mega-projects in US history, to sneaky Mythos disclaimers, to Anthropic admitting compute restraints and, forcing lower capability of Opus 4.7. Where the new model falls behind Gemini but ahead of GPT 5.4, plus why some users are furious at Anthropic. Ending with a 9-year animus, that still affects AI today…https://assemblyai.com/aiexplainedCheck out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:58 - Benchmarks05:21 - Market Share + Compute Problems08:12 - Mythos Exclusives12:56 - User Frustration + Claude Code Updates14:03 - Brockman Amodei Rivalry17:40 - OpenAI vs Anthropic Approach to CodeClaude 4.7 Opus Release Notes: https://www.anthropic.com/news/claude-opus-4-7vs Mythos: https://pbs.twimg.com/media/HGCGugrXUAAKcHp?format=jpg&name=medium232-page System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdfARC-AGI 2: https://x.com/arcprize/status/2044834615417053305/photo/1ParseBench: https://x.com/jerryjliu0/status/2044902620746363016/photo/1GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aaVidoc Security Replication: https://blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-modelsBoris Cherny Settings: https://x.com/Hesamation/status/2043016923961577516/photo/2User Frustration: https://x.com/RileyRalmuto/status/2044836116189069660VibeCode Bench: https://x.com/ValsAI/status/2044791415524471099/photo/1Verge Memo: https://www.theverge.com/ai-artificial-intelligence/911118/openai-memo-cro-ai-competition-anthropic5.4 Cyber: https://openai.com/index/scaling-trusted-access-for-cyber-defense/Data Centers in Absolute $: https://x.com/finmoorhouse/status/2044933442236776794/photo/1…in % of GDP: https://pbs.twimg.com/media/HGEN8FGWQAAN7Np?format=jpg&name=4096x4096WSJ Exclusive: https://www.wsj.com/tech/ai/the-decadelong-feud-shaping-the-future-of-ai-7075acdeBrockman Interview: https://www.youtube.com/watch?v=J6vYvk7R190$1T Valuation: https://x.com/StefanFSchubert/status/2045039686997967082Emotions: https://www.patreon.com/c/aiexplained/postshttps://lmcouncil.ai/benchmarksNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Apr 17, 2026

19m

54

Claude Mythos: Highlights from 244-page Release

The model, the mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary.https://80000hours.org/aiexplainedCheck out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:56 - Internal Release + Availability02:37 - General Capabilities05:12 - Self-improvement?06:15 - ‘Terrifying’ Landscape11:07 - Safety Decision13:22 - Coding14:49 - Alignment, Awareness19:52 - GUI for Agents/Claws + Hallucinations21:34 - …Emotions?25:29 - Her connection244-page System Card: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdfProject Glasswing: https://www.anthropic.com/glasswingZero-Day Details: https://red.anthropic.com/2026/mythos-preview/Mythos ‘terrifying’: https://x.com/bcherny/status/2041605852382351666New Yorker Altman/Amodei: https://archive.fo/20260406100412/https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trustedAlignment Risk Update: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de43218158e5f25c.pdfIn a Park: https://x.com/sleepinyourhat/status/2041584808514744742“Uhm” - https://x.com/thsottiaux/status/2041749947385815109Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Apr 8, 2026

27m

53

OpenAI Spud, a Claude Model set to ‘stir governments’, Beast Mode ARC-AGI-3

First look at exclusive reports about OpenAI's new Spud model, and the model Anthropic think will stir governments to urgency, all in the context of the newly-launched ARC-AGI-3. What does the extreme difficulty of that benchmarks, and its quirky scoring metrics, mean for AI in 2026?https://assemblyai.com/aiexplainedCheck out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:55 - OpenAI Side Quests01:58 - Claude New Model Coming + Universal Equity?03:13 - ARC-AGI 305:00 - Intentional or Unintentional Gaming?07:11 - But is it AGI Harbinger? No Harness09:41 - Not the First12:32 - Automated Researcher15:00 - Claw CaveatSpud: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model?utm_campaign=Editorial&utm_content=Article&utm_medium=organic_social&utm_source=bluesky%2Cfacebook%2Clinkedin%2Cthreads%2Ctwitter&rc=sy0ihqFT: OpenAI Special Model: https://www.ft.com/content/de9bf0af-b241-424f-8229-5870b1c0d93d?syn-25a6b1a6=1Jensen Huang: https://www.forbes.com/sites/antoniopequenoiv/2026/03/23/nvidias-jensen-huang-says-he-thinks-weve-achieved-agi/Axios Article: https://archive.fo/20260326100140/https://www.axios.com/2026/03/26/anthropic-pentagon-ai-deal#selection-827.0-829.257https://arcprize.org/arc-agi/3ARC AGI 3 Paper: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdfNetHack Leaderboard: https://balrogai.com/Paper: https://ai.meta.com/research/publications/the-nethack-learning-environment/https://x.com/_rockt/status/2036864121585438995Claw Shells: https://x.com/DrJimFan/status/2036494601750716711OpenAI Automated Researcher: https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/Patreon Post: https://www.patreon.com/c/aiexplained/postsEng Jobs: https://x.com/lennysan/status/2036535460726767793Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Mar 26, 2026

16m

52

What the New ChatGPT 5.4 Means for the World

Just 48 hours after releasing GPT 5.3 Instant, OpenAI have released GPT 5.4 Thinking, so either their is an imminent singularity or perhaps we are being distracted from other news. This video will give 9 crucial bits of context, not just on the GPT 5.4 drop but on the background to the meltdown between the Pentagon and Anthropic. What does this say about the state of AI progress, your job, and what is next.Check out my fast-growing (!) app, free to use, and code INSIDER15 for 15% off paid tiers: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:06: GPT 5.4 Breakdown05:06 - Closing the Loop06:35 - Spiky Performance10:31 - Advice11:32 - Less Encouraging Developments - Fired Like Dogs17:45 - But Used in IranGPT 5.4: https://openai.com/index/introducing-gpt-5-4/Hallucinations: https://artificialanalysis.ai/evaluations/omniscienceInvestment Banking Bench: https://x.com/bradlightcap/status/2029684672343728452Move 37: https://x.com/nasqret/status/2029628846518010099System Card: https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdfPrediction Market Scandal: https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/GPT 5.3 Instant: https://openai.com/index/gpt-5-3-instant/GDPVal: https://openai.com/index/gdpval/Claude in Iran: https://www.washingtonpost.com/technology/2026/03/04/anthropic-ai-iran-campaign‘Like Dogs’: https://x.com/AndrewCurran_/status/2029605783311470679Altman leak: https://www.cnbc.com/2026/03/03/sam-altman-tells-openai-staff-operational-decisions-up-to-government.htmlOriginal 2024 Switch: https://archive.fo/20240116172526/https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans#selection-6173.83-6173.226Amodei Original Memo: https://www.theinformation.com/articles/read-anthropic-ceos-memo-attacking-openais-mendacious-pentagon-announcement?rc=sy0ihqAnthropic Apology: https://www.anthropic.com/news/where-stand-department-warOpenAI Employee Reaction: https://x.com/tszzl/status/2029334980481212820DoD Suppler Risk: https://www.cnbc.com/amp/2026/03/05/anthropic-pentagon-ai-claude-iran.htmlAtlantic Exclusive: https://archive.fo/20260301152646/https://www.theatlantic.com/technology/2026/03/inside-anthropics-killer-robot-dispute-with-the-pentagon/686200/#selection-941.61-941.212No Negotiation: https://x.com/USWREMichael/status/2029754965778907493$20B Doubling: https://archive.ph/20260304111124/https://www.bloomberg.com/news/articles/2026-03-03/anthropic-nears-20-billion-revenue-run-rate-amid-pentagon-feudMarch 2022 Interview: https://www.youtube.com/watch?v=uAA6PZkek4Ahttps://lmcouncil.ai/Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Mar 6, 2026

21m

51

Deadline Day for Autonomous AI Weapons & Mass Surveillance

Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:44 - Deadline Day + Petition02:42 - Twist 1: Existing Deal03:26 - Twist 2: Existing Policy04:21 - Twist 3: Twin Threats05:54 - Twist 4: Interesting Objections11:32 - Twist 5: Anthropic’s Dropped PolicyDario Statement: https://www.anthropic.com/news/statement-department-of-warGoogle/OpenAI Petition: https://notdivided.org/Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-termsFT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7fPolitico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiationsAnthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279 Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Feb 27, 2026

13m

50

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!https://epoch.ai/ai-explained-datacentersCheck out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:30 - Post-training Dominance04:00 - ARC-AGI 2 Caveat05:54 - Simple Bench Record08:22 - Hallucination Caveat10:05 - Model Card11:12 - Exponential Coming12:20 - Amodei on Generalizing15:10 - One True Benchmark?17:02 - Other Metrics…Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdfRelease: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomyNewsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-weekHallucination AA: https://artificialanalysis.ai/evaluations/omniscienceMelanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/Talaas Fast: https://chatjimmy.ai/Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solvedMetaculus FutureEval: https://www.metaculus.com/futureeval/Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Feb 20, 2026

18m

49

The Two Best AI Models/Enemies Just Got Released Simultaneously

The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much morehttps://assemblyai.com/aiexplainedCheck out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.aiAI Insiders ($9): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:54 - Self-improvement?02:44 - Knowledge Work05:30 - Overly agentic behaviour09:12 - Who Shouldn’t Use Claude Opus11:39 - Step-change?15:09 - Claude’s ‘Personhood’Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdfClaude Code Tip: https://x.com/bcherny/status/2019475897691124107GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/System Card: https://openai.com/index/gpt-5-3-codex-system-card/Browse Comp: https://arxiv.org/pdf/2504.12516v1Finance Agent: https://www.vals.ai/benchmarks/finance_agentTerminal Bench 2: https://arxiv.org/pdf/2601.11868Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-benchMy X post: https://x.com/AIExplainedYT/status/2016851303436095647Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1Altman rebuttal: https://x.com/sama/status/2019139174339928189https://x.com/sama/status/20191402762464420894% of GitHub: https://x.com/dylan522p/status/2019490550911766763Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Feb 6, 2026

19m

48

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown

Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UUCheck out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:10 - Scaling to software engineers06:11 - Permanent Underclass10:18 - Totalitarian Nightmares16:38 - Collection of PersonasEssay: https://www.darioamodei.com/essay/the-adolescence-of-technologyPhysics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropicWorld GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chartDemis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aMKarpathy 80%: https://x.com/karpathy/status/2015883857489522876Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-graceAnthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontierOriginal Constitution: https://www.anthropic.com/news/claudes-constitutionNew Constitution: https://www.anthropic.com/constitutionKimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825https://lmcouncil.ai/benchmarkshttps://www.patreon.com/posts/our-new-age-of-133960279Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Jan 28, 2026

22m

47

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:

A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?https://matsprogram.org/s26-aieCheck out my new app! https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters: 00:00 - Introduction01:12 - Claude Cowork06:48 - Productivity Speed-up + jobs09:33 - Comparing Models12:00 - Brittle AI PaperCowork Intro: https://x.com/claudeai/thread/2010805682434666759'All of it': https://x.com/bcherny/status/2010813886052581538'AGI' Claims: https://x.com/deepfates/status/2004994698335879383Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313sJob Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdfAmodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545Illusion of Insight: https://arxiv.org/pdf/2601.00514Entropy Exploration: https://arxiv.org/pdf/2506.14758ProRL: https://arxiv.org/pdf/2505.24864Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Jan 14, 2026

18m

46

What the Freakiness of 2025 in AI Tells Us About 2026

It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.http://matsprogram.org/s26-aieMy new app! https://lmcouncil.aiPatreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094Chapters:00:00 - Introduction00:34 - Reasoning Models … and limits02:54 - A playable world03:36 - Realism03:50 - AI Slop gone mainstream05:03 - DolphinGemma05:39 - Public Mood07:34 - AI Enlisted08:30 - GPT-511:05 - Open Weight not out13:00 - METR Breakout17:30 - VASA-118:28 - Lateral Productivity20:15 - 1 or 1000 benchmarks needed?24:54 - Continual Learning + Altman on Superintelligence28:08 - Automated Information Discovery ft AlphaEvolveHassabis on Generality: https://x.com/demishassabis/status/2003097405026193809https://www.youtube.com/watch?v=PqVbypvxDtoGemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gifReasoning Trade-offs: https://arxiv.org/pdf/2504.13837DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/METR Time Horizon: https://arxiv.org/pdf/2503.14499https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443https://shash42.substack.com/p/how-to-game-the-metr-plothttps://x.com/METR_Evals/status/2002203627377574113GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problemshttps://simple-bench.com/AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9khttps://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-planSurvey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQAI in Govt: https://x.com/jdcmedlock/status/1939814516503847259Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=Continual Learning: https://abehrouz.github.io/files/NL.pdfJob Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropicGPT4o: https://x.com/AISafetyMemes/status/1916889492172013989Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelinesTuring Test: https://x.com/tunguz/status/1907185471211422147Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/LLM Brainrot: https://arxiv.org/pdf/2510.13928Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-reportEmotional Quotient: https://arxiv.org/pdf/2511.08394Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/AI Insiders ($9!): https://www.patreon.com/AIExplained

Dec 23, 2025

33m

45

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26 Also, do check out my new app: https://lmcouncil.aiChapters: 00:00 - Introduction00:50 - Results02:44 - But… the Flaw04:49 - So Benchmarks are fake? No07:37 - Spatial Reasoning + Hassabis10:06 - Proto-AGI12:07 - Minimal AGI15:07 - Compute Slowdown17:56 - New Data ParadigmGemini 3 Flash: https://deepmind.google/models/gemini/flash/Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDtoLegg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvewAltman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQBrockman Video: https://x.com/OpenAI/status/2001336514786017417Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdfPatreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omnisciencehttps://arxiv.org/pdf/2511.13029lmcouncil.ai/benchmarks https://simple-bench.com/https://x.com/scaling01/status/19996205877448132055.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdfOpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihqCramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018OpenAI Valuation: https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihqIndian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/TheInformation Data: https://x.com/theinformation/status/2001421225751351778Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/AI Insiders ($9!): https://www.patreon.com/AIExplainedNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Dec 19, 2025

19m

44

GPT 5.2: OpenAI Strikes Back

Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.https://www.youtube.com/@eightythousandhoursAI Insiders ($9!): https://www.patreon.com/AIExplainedhttps://lmcouncil.aiChapters:00:00 - Introduction00:55 - Better than Human @ Professional Tasks?04:42 - Test time Compute07:05 - Benchmark Selection09:32 - Simple Results + council comparison13:01 - Long Context13:52 - Self-Improvement15:00 - 10 Years + New ModelsRelease Page: https://openai.com/index/introducing-gpt-5-2/GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gifhttps://lmcouncil.ai/benchmarksCharxiv: https://charxiv.github.io/#leaderboardGDPval: https://arxiv.org/pdf/2510.04374My vid: https://www.youtube.com/watch?v=oK5LxMaROSAKilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1Noam Brown: https://x.com/polynoamial/status/1999189845164667132New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq10 Years of OpenAI: https://openai.com/index/ten-years/GPQA: https://x.com/idavidrein/status/1841265634170278063ARC-AGI 1-2: https://arcprize.org/arc-agi/2/Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813Non-hype Newsletter: https://signaltonoise.beehiiv.com/https://lmcouncil.ai

Dec 12, 2025

17m

43

You Are Being Told Contradictory Things About AI: 8 examples

With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.https://epoch.ai/data/data-centersEpoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.Chapters: 00:00 - Introduction00:42 - Job Apocalypse?01:45 - Scaling to AGI04:15 - Recursive Self-Improvement Needed, or Not09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.513:27 - DeepSeek Speciale vs Mistral Large v316:45 - Claude Soul Documenthttps://lmcouncil.ai/AI Insiders ($9!): https://www.patreon.com/AIExplainedGuardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itselfMIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdfvs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.htmlAmodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIkClaude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-documentCapabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safetyIlya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdfhttps://x.com/joel_bkr/status/1993023436541903155METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihqOpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safetyRocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.htmlDeepSeek Paper: https://arxiv.org/html/2512.02556v1DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/https://simple-bench.com/Patreon Post: https://www.patreon.com/c/aiexplained/postsRobot: https://x.com/jloganolson/status/1985850115379351799

Dec 5, 2025

20m

42

Gemini 3 is Here: 11 Details You Might Have Missed

Gemini 3 Pro is out, and records fell like snowflakes in Svalbard. No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.https://app.grayswan.ai/ai-explainedhttps://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedNon-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Nov 19, 2025

21m

41

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.https://assemblyai.com/aiexplainedChapters:00:00 - Introduction00:56 - GPT 5.1 Smarter?01:47 - Some Regressions03:22 - Sycophancy?05:22 - Claude Auto-Hacking 06:16 - Jailbreaking through Granularity08:22 - This Will be Re-used09:30 - Hallucinating Hacker09:57 - Surprisingly Neutral Tone12:18 - SIMA 214:10 - Alpha Parallels17:24 - AI MusicGPT 5.1 Announcement: https://openai.com/index/gpt-5-1/System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdfBenchmarks: https://openai.com/index/gpt-5-1-for-developers/Simple Bench: https://lmcouncil.ai/benchmarksAuto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618https://www.anthropic.com/news/disrupting-AI-espionageReport: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdfSima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/https://x.com/amoufarek/status/1988986075331858693Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/Voyager: https://voyager.minedojo.org/Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/

Nov 14, 2025

18m

40

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly). https://app.grayswan.ai/ai-explainedThis, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:26 - Continual Learning (Nested Learning / HOPE)07:00 - Introspection10:54 - Image-Gen ProgressNested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/Nested Learning Paper: https://abehrouz.github.io/files/NL.pdfOriginal Titans Paper: https://arxiv.org/pdf/2501.00663Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siriIntrospection: https://www.anthropic.com/research/introspectionFull Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanismsEarlier Work: https://www.anthropic.com/research/mapping-mind-language-modelhttps://transformer-circuits.pub/2024/scaling-monosemanticity/index.htmlRelease Post: https://x.com/AnthropicAI/status/1983584136972677319https://lmcouncil.ai Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Nov 10, 2025

12m

39

Sora 2 - It will only get more realistic from here

Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer…https://80000hours.org/aiexplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:40 - Two models?01:15 - Rollout Details01:43 - Versus Sora 1 / Veo 304:30 - Sora App / Social Media06:40 - Masterplan09:30 - Generalist Agent? Periodic Labs12:05 - Claude Sonnet 4.513:42 - Future OutlookAnnouncement: https://openai.com/index/sora-2/Launch Video: https://www.youtube.com/live/gzneGhpXwjUSystem Card: https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdfSam Altman Blog Post on Sora App: https://blog.samaltman.com/sora-2Most Intelligent Claim: https://x.com/willdepue/status/1973089331284681110GTA: https://x.com/AndrewCurran_/status/1973298436536766666Meta Vibes: https://x.com/alexandr_wang/status/1971295156411433228?s=46Altman on Regulations: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altmanOpenAI Profit: https://www.theinformation.com/articles/openais-first-half-results-4-3-billion-sales-2-5-billion-cash-burn?rc=sy0ihqPeriodic Labs: https://periodic.com/https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.htmlhttps://x.com/LiamFedus/status/1973055380193431965https://baincapitalventures.com/insight/we-must-know-we-will-know/?s=09Sonnet 4.5: https://www.anthropic.com/news/claude-sonnet-4-5https://simple-bench.com/Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Oct 1, 2025

15m

38

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

An OpenAI report released in the last 24 hours is the best look we have as to whether 2025 AI can automate your job. I’ll go through 4 unexpected findings, from which model is best at what, to practical tips and massive caveats. Plus UFC robots, radiologist essay, don’t trust videos and the blockers to the singularity. Gray Swan: https://app.grayswan.ai/ai-explainedGDPval: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf[GDP Impact: https://fred.stlouisfed.org/release/tables?rid=331&eid=211Task List: https://www.onetonline.org/link/summary/11-9141.00Summer Tweet: https://x.com/LHSummers/status/1971252567981146347Emad: https://x.com/EMostaque/status/1971254153067593739Robots: https://x.com/cixliv/status/1967663286679478759Unitree G1: https://x.com/UnitreeRobotics/status/1970039940022239491Don’t Trust Video: https://x.com/AISafetyMemes/status/1970453369446871420AGI Tweet: https://x.com/hyhieu226/status/1968378785709133915Blockers to the Singularity: https://www.patreon.com/posts/blockers-to-and-139264812Framework: https://gemini.google.com/share/f4b9c85a6ae9METR Study (Dev Slowdown): https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/Karpathy Tweet: https://x.com/karpathy/status/1971220449515516391Radiology Essay: https://worksinprogress.co/issue/the-algorithm-will-see-you-now/Chapters:00:00 - Introduction00:55 - OpenAI Report Summary02:40 - Tipping Point Speed-up04:11 - Better than Industry Experts?06:33 - Big Caveat11:10 - Karpathy and the Radiologist Analogy13:30 - Outro

Sep 26, 2025

14m

37

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops

Sam Altman, CEO of OpenAI, announced a set of new ‘protections’ and ‘privileges’ for ChatGPT users, requiring a significant amount of trust from users. From predicting your age based on your chat to calling law enforcement if you are at risk of harm, to allowing non-minors to flirt. But amidst all of these announcements, there are interview snippets you may have missed, as Altman dramatically revises his predictions of AI impact on jobs. Plus a Hassbis backtrack to boot.https://80000hours.org/aiexplainedCalling the Cops: https://openai.com/index/teen-safety-freedom-and-privacy/Age Prediction: https://openai.com/index/building-towards-age-prediction/Not Everyone Will Agree: https://x.com/sama/status/1967955739911364693?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetTheory 1: NYT Lawsuit: https://openai.com/index/response-to-nyt-data-demands/Theory 2: FTC Investigation into AI Companions: https://x.com/AndrewCurran_/status/1966167585994764743YT Does the Same: https://www.cbsnews.com/news/youtube-ai-powered-technology-teen-users/Carlsen Interview: https://www.youtube.com/watch?v=5KmpT-BoVf4vs Senate Testimony (70% Jobs): https://www.youtube.com/watch?v=5CWVP8-XVjQHallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdfHassbis Quote 1: https://www.youtube.com/watch?v=toShbNUGAyovs Quote 2: https://www.youtube.com/watch?v=Kr3Sh2PKA8Y

Sep 16, 2025

11m

36

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

Wait, why did Sam Altman say AI was in a bubble? Or did he? Is it? 8 points for you to consider, before we all get distracted by Nano Banana.Chapters:00:00 - Introduction01:14 - Sam Altman Clarification02:30 - Media Calls a Bubble (for the tenth time)03:40 - MIT and McKinsey Analysed08:21 - Incremental Progress Deceptive12:07 - Reasoning Breakthroughs15:31 - CEOs might not know their products17:25 - But did stocks go down?17:31 - Media is Contradictory of coursehttps://donate.redcross.org.uk/appeal/gaza-crisis-appealBubble about to burst: https://www.telegraph.co.uk/business/2025/08/20/ai-report-triggering-panic-and-fear-on-wall-street/Nano Banana: https://blog.google/products/gemini/updated-image-editing-model/https://ai.studio/bananaMcKinsey Report: https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage#/https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#/Revenue: https://www.wsj.com/tech/ai/mckinsey-consulting-firms-ai-strategy-89fbf1beMIT Report: https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdfSafe Superintelligence: https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/Thinking Machines Lab: https://techcrunch.com/2025/07/15/mira-muratis-thinking-machines-lab-is-worth-12b-in-seed-round/WSJ Prediction 2024: https://www.wsj.com/tech/ai/the-ai-revolution-is-already-losing-steam-a93478b1WP Prediction 2023: https://www.washingtonpost.com/technology/2023/08/05/ai-hype-bubble-chatgpt/Companies are Pouring Billions into AI: https://www.nytimes.com/2025/08/13/business/ai-business-payoff-lags.htmlConsumer Surplus: https://www.wsj.com/opinion/ais-overlooked-97-billion-contribution-to-the-economy-users-service-da6e8f55Figure AI robot: https://x.com/adcock_brett/status/1958193476639826383GDP Bet: https://x.com/adamdangelo/status/1627726566259318784?lang=enGenie 3 Immersion: https://x.com/holynski_/status/1953879983535141043https://x.com/elonmusk/status/1953861448431718662htttps://simple-bench.comMMMU: https://mmmu-benchmark.github.io/#leaderboard Prophet Arena: https://www.prophetarena.co/leaderboardNYT Jobs: https://www.nytimes.com/2025/08/19/opinion/ai-job-loss-deindustrialization.htmlDawn of Reasoning?: https://openreview.net/pdf?id=FkKBxp0FhRvs :https://arxiv.org/pdf/2403.04121ARC-AGI: https://arcprize.org/arc-agi/1/https://x.com/fchollet/status/1870169764762710376?lang=en-GBTuring Test: https://arxiv.org/pdf/2503.23674Mathematics of Starvation: https://www.theguardian.com/world/2025/jul/31/the-mathematics-of-starvation-how-israel-caused-a-famine-in-gazahttps://donate.redcross.org.uk/appeal/gaza-crisis-appealhttps://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/METR Interview: https://www.patreon.com/c/aiexplained/postsAlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdfAmodei: https://kantrowitz.medium.com/the-making-of-anthropic-ceo-dario-amodei-449777529dd6https://www.theloganbartlettshow.com/archive/ep-82-dario-amodeis-ai-predictions-through-2030#:~:text=DARIO%3A%20I%20think%20our%20concern,being%20responsible%20to%20accelerate%20thingsUnreleased OpenAI: https://x.com/alexwei_/status/1954966393419599962VLMs Tricked: https://x.com/an_vo12/status/1943715159559545186AI Insiders ($9!): https://www.patreon.com/AIExplained

Aug 26, 2025

18m

35

GPT-5 has Arrived

GPT-5 will change how hundreds of millions of people use AI. Yes, you might have to forgive the chart crimes, the underwhelming livestream and Altman hype… But it’s a good model. I have read the 50 page system card in full, have the benchmark scores, coding tests, and things you might have missed.https://app.grayswan.ai/ai-explainedAnnouncement: https://openai.com/index/introducing-gpt-5/System Card: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdfExtra Paper: https://cdn.openai.com/pdf/be60c07b-6bc2-4f54-bcee-4141e1d6c69a/gpt-5-safe_completions.pdfAltman tweet: https://x.com/sama/status/1953551377873117369Livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfoMETR Report: https://metr.github.io/autonomy-evals-guide/gpt-5-report/ARC-AGI-2: https://x.com/fchollet/status/1953511631054680085Claude Opus 4.1: https://www.anthropic.com/news/claude-opus-4-1MMMU: https://mmmu-benchmark.github.io/Cursor Praise: https://x.com/ryolu_/status/1953531724895596669

Aug 7, 2025

15m

34

Genie 3: The World Becomes Playable (DeepMind)

Soon, anything will be playable. A photo becomes an interactive world, a selfie becomes a new game. Genie 3 from Google, debuting just 2 hours ago, is what I mean, and I have the full analysis, plus the pushback I gave the authors (will it really lead to reliable AI agents? Is that even the point?). You make your own mind up, but it’s certainly fascinating, and not to be overlooked in the week that will bring us GPT-5.https://80000hours.org/aiexplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters: 00:00 - Introduction01:27 - Background and Access04:58 - Caveats07:24 - Demo10:12 - ConclusionAnnouncement: https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/Isaac Labs: https://developer.nvidia.com/isaac/labGenie 2 Coverage: https://www.youtube.com/watch?v=jIm2T7h_a0MTED Talk Roblox: https://www.youtube.com/watch?v=-OAP0ho5AUgDeepThink Post: https://www.patreon.com/posts/deep-ish-on-new-135688441AI Insiders ($9!): https://www.patreon.com/AIExplainedNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Aug 5, 2025

11m

33

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)

GPT-5 did what? OpenAI ahead of Google? There are 9 ways to misread the headlines of the last 48 hours, so this video is here to tell you what happened, sans sizzle. It’s been a fairly momentous last few days, so let’s dive in to the International Math Olympiad Gold, GPT-5 alpha release, whether mathematicians are out of jobs, and the white collar impact by year’s end.Job Board: https://80000hours.org/aiexplainedNew Documentary on Patreon: https://www.patreon.com/posts/our-new-age-of-133960279Chapters: 00:00 - Introduction00:18 - AI > Mathematicians?01:23 - OPENAI vs GOOGLE02:42 - Irrelevant to Jobs or …06:45 - White-collar jobs gone?10:26 - AI is Plateauing?12:00 - We Don’t Know the Details…14:33 - GPT-5 alpha14:54 - Nothing but Exponentials?15:53 - No Impact?Announcement: https://x.com/alexwei_/status/1946477742855532918UCLA Math Prof: https://x.com/ErnestRyu/status/1946699302308635130ChatGPT Agent: https://openai.com/index/introducing-chatgpt-agent/Livestream: https://www.youtube.com/watch?v=1jn_RpbPbEc&t=796sSystem Card: https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdfJerry Tworek (OpenAI): https://x.com/MillionInt/status/1946556255490982022https://x.com/MillionInt/status/1946558130906968330Noam Brown Details: https://x.com/polynoamial/status/1946478249187377206Trieu Tranh Retweet: https://x.com/Mihonarium/status/1946880931723194389Neel Nanda: https://x.com/NeelNanda5/status/1946602953370173647Terence Tao: https://mathstodon.xyz/@taoSam Altman: https://x.com/sama/status/1946569252296929727METR Dev Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/Ravid Schwatz: https://x.com/ziv_ravid/status/1946378712716562605AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/https://simple-bench.com/Meta Salary: https://www.tomshardware.com/tech-industry/artificial-intelligence/abel-founder-claims-meta-offered-usd1-25-billion-over-four-years-to-ai-hire-person-still-said-no-despite-equivalent-of-usd312-million-yearly-salary$2k per month: https://www.theinformation.com/articles/openai-considers-higher-priced-subscriptions-to-its-chatbot-ai-preview-of-the-informations-ai-summit?rc=sy0ihq

Jul 21, 2025

17m

32

Grok 4 - 10 New Things to Know

Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:22 - Benchmark Results02:11 - Benchmark Caveats02:59 - ARC-AGI 2 03:35 - SimpleBench04:49 - ‘Humanity’s Last Exam’07:20 - SuperGrok Heavy Price07:58 - API Price08:12 - Grok 5, Gemini 3.0 Beta, GPT-509:12 - System Prompt Change + $1B a month, pollution10:20 - Not soloing science, helping you solo codeLivestream: https://www.youtube.com/watch?v=1tQ_KrlHgfg&t=1sPrice: https://grok.com/#subscribehttps://x.com/ArtificialAnlys/status/1943166841150644622Gemini DeepThink: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#deep-thinkhttps://simple-bench.com/ARC-AGI 2: https://x.com/arcprize/status/1943168950763950555Humanity’s Last Exam: https://agi.safe.ai/SmartGPT: https://www.youtube.com/watch?v=hVade_8H8mENew Power Plant, 1m GPUs: https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musk-xai-power-plant-overseas-to-power-1-million-gpusGemini 3.0 beta: https://web.archive.org/web/20250709174548/https://github.com/google-gemini/gemini-cli/blob/b0cce952860b9ff51a0f731fbb8a7649ead23530/packages/cli/src/ui/utils/errorParsing.test.tsPollution: https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphishttps://www.youtube.com/watch?v=C8rU4dv2w8Qhttps://www.youtube.com/watch?v=3VJT2JeDCywSystem Prompt: https://github.com/xai-org/grok-prompts/blob/535aa67a6221ce4928761335a38dea8e678d8501/ask_grok_system_prompt.j2Burn Rate: https://www.bloomberg.com/news/articles/2025-06-17/musk-s-xai-burning-through-1-billion-a-month-as-costs-pile-upRon Johnson: https://x.com/jdcmedlock/status/1939814516503847259Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Jul 10, 2025

11m

31

When Will AI Models Blackmail You, and Why?

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:20 - What prompts blackmail?02:44 - Blackmail walkthrough 06:04 - ‘American interests’08:00 - Inherent desire?10:45 - Switching Goals11:35 - Murder12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix?16:27 - Any fixes?17:45 - Chekov’s Gun19:25 - Job implications21:19 - Bonus DetailsReport: https://www.anthropic.com/research/agentic-misalignment30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdfAnnouncement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetOpenAI Files: https://www.openaifiles.org/Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdfNew Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-schemingInteresting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-voidNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Jun 24, 2025

26m

30

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know

What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next. Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplainedPlus o3-pro and whether it is my current most-recommended model.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:57 - Viral Post + Headlines01:42 - Apple Paper Analysis08:34 - But they do Hallucinate 10:43 - Not Supercomputers11:18 - o3 Pro and Recommendations 13.7M Tweet: https://x.com/RubenHssd/status/1931389580105925115Apple Paper: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdfGuardian Article: https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapseLisan al Gaib post: https://x.com/scaling01/status/1931854370716426246Multiplication: https://x.com/yuntiandeng/status/1836114401213989366The Illusion of the Illusion of Thinking: https://drive.google.com/file/d/1Zx9ikRj0Enc3SB4wA9HlYIlpmO_8QiUO/viewMarcus: https://www.theguardian.com/commentisfree/2025/jun/10/billion-dollar-ai-puzzle-break-downProf Rao: https://x.com/rao2z/status/1927707640223719631AI Job Headlines: https://www.nytimes.com/2025/06/11/technology/ai-mechanize-jobs.htmlhttps://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropicSky News Story: https://news.sky.com/story/can-we-trust-chatgpt-despite-it-hallucinating-answers-13380975Veo 3 Ad: https://x.com/Kalshi/status/1932891608388681791Altman Essay: https://blog.samaltman.com/o3 Original benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8b6c44-acd6-43b3-b5c6-1a1d5c6c25e4_2486x1388.pnghttps://pbs.twimg.com/media/GfQ0bfcXQAAQt13.jpgAlpha Evolve Video: https://www.youtube.com/watch?v=RH4hAgvYSzghttps://simple-bench.com/Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Jun 12, 2025

14m

29

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed

There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plus Sundar Pichai’s AGI date and an analysis of whether the current AI unemployment headlines are justified, and Elevenlabs v3.https://emergentmind.comAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction02:04 - Gemini 2.5 Ultra 03:34 - Benchmarks07:41 - AGI Date and Meaning Pichai09:13 - Jobs and AI Unemployment Fears15:28 - Elevenlabs v3Sundar Pichai Fridman: https://www.youtube.com/watch?v=9V6tWC4CdFQPichai More Jobs (until 2026 at least): https://www.techradar.com/pro/alphabet-ceo-sundar-pichai-says-ai-wont-lead-to-job-cuts-will-be-an-acceleratorGemini Comparison: https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/https://x.com/viathebrink/status/1930733154203292121https://simple-bench.com/White Collar Bloodbath: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropichttps://fortune.com/2025/05/25/ai-entry-level-jobs-gen-z-careers-young-workers-linkedin/https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.htmlhttps://www.nytimes.com/2025/03/25/business/economy/white-collar-layoffs.htmlCollege Unemployment: https://www.newyorkfed.org/research/college-labor-market/#--:explore:unemploymentNew Scientist AI Hallucinaitons: https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/Duolingo: https://fortune.com/2025/05/24/duolingo-ai-first-employees-ceo-luis-von-ahn/Klarna: https://www.forbes.com/sites/quickerbettertech/2025/05/18/business-tech-news-klarna-reverses-on-ai-says-customers-like-talking-to-people/Sholto Douglas: https://www.reddit.com/r/ClaudeAI/comments/1ktt1rb/anthropics_sholto_douglas_says_by_202728_its/Figure 02: https://x.com/adcock_brett/status/1930693311771332853Elevenlabs v3: https://www.youtube.com/watch?v=zv_IoWIO5EkGemini Speech Generation: https://aistudio.google.com/generate-speechNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Jun 6, 2025

16m

28

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more!https://80000hours.org/aiexplainedChapters: 00:00 - Introduction01:12 - 3 Quick Controversies02:42 - Benchmark Results 04:20 - 120 page Card 20 Highlights10:07 - Coding Test11:27 - Model Welfare and Spiritual Bliss13:29 - ASL-3Claude Card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?s=09ASL 3:https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdfTweets: https://x.com/fish_kyle3/status/1925597284546629753https://x.com/EMostaque/status/1925624164527874452?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetCursor Says State of the Art for Coding: https://x.com/cursor_ai/status/1925594428095561941Benchmarks: https://www.anthropic.com/news/claude-4

May 22, 2025

19m

27

Google Takes No Prisoners Amid Torrent of AI Announcements

Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Veo 3 clips to enjoy…https://80000hours.org/aiexplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:48 - Veo 302:10 - Gemini 2.5 Flash03:13 - Universal Assistant03:47 - Usage Skyrockets + OpenAI dig04:51 - Gemini Pro Deep Think06:21 - Overviews and AI Mode07:26 - Deep Research Updates (new) + Jules 08:53 - Make and Deploy Apps with Gemini09:12 - Imagen 4 10:00 - Gemini Diffusion11:46 - Try It On12:17 - SynthID Detector13:30 - GemmaVerse, SignGemma, Gemma3n, medGemma14:24 - Outro + ClipsEvent: https://www.youtube.com/watch?v=o8NiE3XMPrMNtaive Audio: https://aistudio.google.com/generate-speechGemini Diffusion: https://deepmind.google/models/gemini-diffusion/#capabilities New Gemini 2.5 Flash: https://deepmind.google/models/gemini/flash/SignGemma (See end of this vid): https://www.youtube.com/watch?v=GjvgtwSOCaoDeep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#flash-improvementsGoogle Parallel Sampling: https://www.patreon.com/posts/next-level-good-127441188Price Plans: https://blog.google/products/google-one/google-ai-ultra/Imagen 4 Benchmarks: https://deepmind.google/models/imagen/Jules: https://jules.google/SynthID Detector: https://blog.google/technology/ai/google-synthid-ai-content-detector/Veo 3 Benchmarks: https://deepmind.google/models/veo/evals/MedGemma: https://deepmind.google/models/gemma/medgemma/Build Apps: https://aistudio.google.com/appsNon-hype Newsletter: https://signaltonoise.beehiiv.com/

May 21, 2025

17m

26

AI Improves at Self-improving

AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system.Gray Swan: http://app.grayswan.ai/ai-explainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:27 - AlphaEvolve05:23 - Limitation06:10 - Achievements08:21 - Future Improvements13:30 - Quirks16:34 - Final ThoughtsAlphaEvolve release: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdfTerence Tao Quote: https://mathstodon.xyz/@tao/114508029896631083Nature Article: https://www.nature.com/articles/s41586-022-05172-4MIT Article: https://www.technologyreview.com/2025/05/14/1116438/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems/AI Co-Scientist: https://arxiv.org/pdf/2502.18864OpenAI Codex: https://openai.com/index/introducing-codex/70% of Pull Requests: https://x.com/slow_developer/status/1920920456393028027Amodei Essay: https://www.darioamodei.com/essay/machines-of-loving-graceOpenAI Jason Wei Tweet: https://x.com/_jasonwei/status/1923091260354531612PromptBreeder: https://arxiv.org/pdf/2309.16797DrEureka: https://arxiv.org/pdf/2406.01967FT DeepMind: https://www.ft.com/content/4e497a91-670a-4f69-be4a-18e247daba3eNon-hype Newsletter: https://signaltonoise.beehiiv.com/

May 19, 2025

17m

25

o3 breaks (some) records, but AI becomes pay-to-win

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.https://app.grayswan.ai/ai-explainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:33 - FictionLiveBench01:37 - PHYBench02:14 - SimpleBench02:54 - Virology Capabilities Test03:13 - Mathematics Performance04:29 - Vision Benchmarks05:43 - V* and how o3 works06:44 - Revenue and costs for you08:54 - Expensive RL and trade-offs 09:40 - How to spend the OOMs13:27 - Gray Swan ArenaGreen Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573Visual puzzles: https://neulab.github.io/VisualPuzzles/Fiction Bench: https://x.com/ficlive/status/1912863028141244850https://geobench.org/https://simple-bench.com/AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/USAMO: https://x.com/mbalunovic/status/1914398518896193747NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihqNumber of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihqSubscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/GPU Trade-offs: https://x.com/sama/status/1915098951067554030RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controlsLog-linear Returns: https://x.com/bobmcgrewai/status/18952282919819432652030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030Model Size: https://x.com/slow_developer/status/1874554473256997201Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381Papers on Patreon: https://arxiv.org/pdf/2502.01839https://arxiv.org/pdf/2504.13837Chollet Quote: https://x.com/fchollet/status/1912934762580447447OpenSim: https://opensim.stanford.edu/Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Apr 25, 2025

14m

24

o3 and o4-mini - they’re great, but easy to over-hype

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - o3 and o4-minihttps://simple-bench.com/Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679System Card: https://openai.com/index/o3-o4-mini-system-card/Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/https://deepmind.google/technologies/gemini/pro/https://x.com/DeryaTR_/status/1912558350794961168https://x.com/polynoamial/status/1912564068168450396API Pricing:https://openai.com/api/pricing/https://aider.chat/docs/leaderboards/Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Apr 16, 2025

14m

23

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed

This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.https://www.emergentmind.com/Chapters: 00:00 - Introduction00:30 - Kling 2.001:35 - GPT 4.105:25 - o3 Build-up07:37 - ‘Product Company’09:31 - Safe Superintelligence10:54 - DolphinGemma13:16 - Data Dominance?Kling 2.0: https://app.klingai.com/global/release-notesDolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09https://openai.com/index/gpt-4-1/OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihqPhysical reasoning: https://x.com/a_karvonen/status/1911839968990814503Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68khttps://simple-bench.com/try-yourselfhttps://aider.chat/docs/leaderboards/4.5: https://www.youtube.com/watch?v=6nJZopACRuQGeospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151Evals: https://www.youtube.com/watch?v=scsW6_2SPC4Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-aihttps://x.com/sethsaler/status/1912188383457059301https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/https://ai.meta.com/blog/llama-4-multimodal-intelligence/https://deepmind.google/technologies/gemini/pro/https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490

Apr 16, 2025

20m

22

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...

The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explainedDeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:47 - Stock Crash 02:28 - Llama 410:55 - o3 News11:59 - OpenAI non-profit?13:13 - AI 2027Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJikKnowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/Aider Polyglot: https://aider.chat/docs/leaderboards/Gemini 1.5: https://arxiv.org/pdf/2403.05530Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlockOpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veteransDeep research System Card: https://cdn.openai.com/deep-research-system-card.pdfhttps://openai.com/index/paperbench/AI 2027: https://ai-2027.com/METR Paper: https://arxiv.org/pdf/2503.14499OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-likehttps://simple-bench.com/Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Apr 7, 2025

23m

21

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:36 - Fiction Bench02:41 - Practicality - YouTube urls + Security - cut-off date03:42 - Coding 06:22 - WeirdML Bench07:01 - Simple Bench Record High 11:23 - Reverse Engineering!13:22 - Anthropic Paper17:49 - 3 CaveatsGemini 2.5 Updated: https://deepmind.google/technologies/gemini/Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87https://simple-bench.com/WeirdML: https://htihle.github.io/weirdml.htmlhttps://x.com/htihle/status/1905014058228625542Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-modelhttps://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cothttps://aistudio.google.com/prompts/new_chatSearch Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.phpLive bench: https://livebench.ai/#/Paper: https://arxiv.org/pdf/2406.19314LiveCode Bench: https://livecodebench.github.io/SWE-Verified: https://arxiv.org/pdf/2310.06770Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Mar 28, 2025

21m

20

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI

Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters: 00:00 - Introduction01:15 - Gemini 2.5 Benchmarks05:46 - Long Context, Simple indication07:08 - New Deepseek V3 -02409:11 - Microsoft MAI11:48 - 90% of code but new Claude jobs‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihqLMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1Free for now: https://x.com/btibor91/status/1904578053537476628Vista Bench:https://scale.com/leaderboard/visual_language_understandingDeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemonAmodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017sAnthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/https://simple-bench.com/Release Date Comments: https://x.com/zacharynado/status/1904647277861318979Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Mar 25, 2025

13m

19

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)

Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI.https://app.grayswan.ai/arenaAI Insiders ($9!): https://www.patreon.com/AIExplainedPatreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767Chapters:00:00 - Introduction00:46 - Hype Campaign02:40 - Single, Public Benchmark 03:12 - What is Manus AI?04:22 - Test 105:12 - Cost and Rate Limits06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch08:24 - Test 3 (not AGI)11:10 - 4 Trends in AI in 202511:37 - Hype WorksManus AI: https://manus.im/appXiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensationGaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihqHype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940https://x.com/EHuanglu/status/1899110687902978373https://x.com/AJs_AI/status/1898756132384178291Mistakes: https://x.com/TheXeophon/status/1898737178273829220Tools and Code: https://x.com/peakji/status/1898994802194346408https://operator.chatgpt.com/Non-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzsprout.com/

Mar 13, 2025

12m

18

GPT 4.5 - not so much wow

GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI.https://www.emergentmind.com/AI Insiders (now $9!): https://www.patreon.com/AIExplainedChapters00:00 - Introduction01:04 - Details and Benchmarks03:04 - Emotional intelligence? 08:37 - Creative writing?11:40 - Visual reasoning and Pricing12:41 - Simple Performance16:01 - End of Pretraining Scaling?17:03 - CEO Hype18:11 - System Card Highlights23:32 - Karpathy ReactionGPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdfRelease Notes: https://openai.com/index/gpt-4-5-system-card/Altman Hype: https://x.com/sama/status/1891533802779910471Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792End of an Era: https://x.com/wgussml/status/1895187231666774377Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/Smell: https://x.com/rapha_gl/status/1895213014699385082Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdfReddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/API Pricing: https://openai.com/api/pricing/LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1shttps://simple-bench.com/Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863https://x.com/karpathy/status/1895337579589079434Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Feb 28, 2025

25m

17

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)

Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaminghttps://x.com/GraySwanAI/status/1894084923260043282Chapters:00:00 - Introduction01:25 - Claude 3.7 New Stats/Demos 05:22 - 128k Output06:13 - Pokemon06:58 - Just a tool? 09:54 - DeepSeek R210:20 - Claude 3.7 System Card/Paper Highlights 17:18 - Simple Record Score/Competition20:37 - Grok 3 + Redteaming prizes22:26 - Google Co-scientist24:02 - Humanoid Robot Developments3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnetvs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdfUnfaithful CoT: https://arxiv.org/pdf/2305.04388Original Constitution: https://www.anthropic.com/news/claudes-constitutionResponsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdfAmodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lohttps://simple-bench.com/400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156sDeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/Helix: https://www.figure.ai/news/helixTechTrance: https://www.youtube.com/@TheTechTrance/videosGPT 4.5 Soon:

Feb 25, 2025

27m

16

AGI: (gets close), Humans: ‘Who Gets the Money?’

A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid.GiveWell: https://www.givewell.org/charities/top-charitiesAI Insiders ($9!): https://www.patreon.com/AIExplaineds1 Paper: https://arxiv.org/pdf/2501.19393Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetGoogle vs OpenAI: https://x.com/sama/status/1888703820596977684RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.htmlDev Meetup: https://x.com/btibor91/status/1888976302621040852Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.htmlKarpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMIAmodei Warning: https://www.anthropic.com/news/paris-ai-summitBengio Source: https://www.youtube.com/watch?v=6HDjVncL5GoChapters:00:00 - Intro01:37 - AGI Inches Closer04:26 - ‘Super-Exponential’05:58 - Musk Bid07:34 - Luxury Goods and Land09:05 - ‘Benefits All Humanity’12:52 - ‘National Security’14:21 - s120:33 - Final thoughtsNon-hype Newsletter: https://signaltonoise.beehiiv.com/

Feb 11, 2025

22m

15

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.Deep Research: https://openai.com/index/introducing-deep-research/https://www.youtube.com/watch?v=YkCDVn3_wiwGAIA Bench: https://openreview.net/forum?id=fibxvahvs3https://openreview.net/pdf?id=fibxvahvs3CodeELO:https://arxiv.org/pdf/2501.01257CamelCamel:https://uk.camelcamelcamel.com/Deepseek R1 with search: https://chat.deepseek.com/https://arxiv.org/pdf/2501.12948HaluBench: https://arxiv.org/pdf/2407.08488Chapters:00:00 - Introduction01:06 - Powered by o3, Humanity’s Last Exam, GAIA03:55 - Simple Tests 06:00 - Good News vs Deepseek R1 and Gemini Deep Research09:32 - Bad News on Hallucinations 14:14 - What Can’t it Browse?14:42 - For Shopping?16:40 - Final thoughts

Feb 3, 2025

18m

14

o3-mini and the “AI War”

o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.https://wandb.me/simple-bench(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharingChapters: 00:00 - Introduction00:45 - o3 mini05:11 - First impressions vs Deepseek R107:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…12:40 - Simple Competition Finale13:03 - Clips and Final Thoughts on the “AI War”O3-mini: https://openai.com/index/openai-o3-mini/Paper: https://cdn.openai.com/o3-mini-system-card.pdfAmodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09FrontierMath wild stat:https://arxiv.org/pdf/2411.04872Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416“AI War” by Wang: https://scale.com/blog/win-the-ai-warAnthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safetyAI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-aiWang Clip: https://x.com/tsarnick/status/1867700453494206883Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188https://simple-bench.com/

Jan 31, 2025

15m

13

Nothing Much Happens in AI, Then Everything Does All At Once

When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.00:00 - Introduction00:54 - OpenAI Operator04:53 - Perplexity Assistant 05:15 - StarGate07:51 - Better than o3?08:25 - DeepSeek R1 Analysis12:12 - Training Secrets15:19 - No More Process Rewarding ?19:01 - Hassabis Timeline Accelerates21:22 - Humanity’s Last Examhttps://app.grayswan.ai/arena/chat/harmful-ai-assistanthttps://app.grayswan.ai/arenahttps://openai.com/index/computer-using-agent/System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txtOpenAI Operator: https://operator.chatgpt.com/System Card: https://cdn.openai.com/operator_system_card.pdfThere is No Plan: https://x.com/jeffclune/status/1882120726339318007Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686Stargate: https://openai.com/index/announcing-the-stargate-project/Labour goes to 0: https://moores.samaltman.com/Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaoticMicrosoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihqDylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTkDeepseek R1: https://arxiv.org/pdf/2501.12948https://arxiv.org/pdf/2412.19437Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=largehttps://simple-bench.com/Process: https://x.com/sama/status/1664018190840614912https://x.com/karpathy/status/1835561952258723930https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPUHumanity’s Last Exam: https://agi.safe.ai/https://x.com/DanHendrycks/status/1882481730671857815https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09

Jan 24, 2025

23m

12

Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out

OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today...80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1EzgSpotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDibAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:13 - Pro Cost and OpenAI Operator04:00 - Agent Benchmarks Being Targeted07:48 - Fast Take-off, Altman08:48 - Altman flip-flops10:02 - Deepseek R1 First ReactionAltman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564WebVoyager: https://arxiv.org/pdf/2401.13919OSWorld: https://arxiv.org/pdf/2404.07972Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-chinaDeepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdfTarget is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520Support Regulations: https://www.techemails.com/p/elon-musk-and-openaihttps://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.htmlDonation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4‘Feel the AGI’: https://x.com/polynoamial?lang=enGPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Jan 20, 2025

13m

11

OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward

Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more. wandb.me/simple-bench(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharingTheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-aiOpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihqAltman Singularity: https://x.com/sama/status/1875603249472139576Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621shttps://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blogDeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783Altman Blog Reflections: https://blog.samaltman.com/reflectionsOpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-aiAltman 2015: https://blog.samaltman.com/machine-intelligence-part-1OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihqMicrosoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihqEpoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboardTask Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagiRL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDcoMiles Brunda

Jan 8, 2025

23m

10

o3 - wow

o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. FrontierMath: https://epoch.ai/frontiermathhttps://arxiv.org/pdf/2411.04872Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthroughMLC Paper: https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=socialAlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdfHuman Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/https://simple-bench.com/John Hallman Tweet: https://x.com/johnohallman/status/187023337568194572500:00 - Introduction01:19 - What is o3?03:18 - FrontierMath05:15 - o4, o506:03 - GPQA06:24 - Coding, Codeforces + SWE-verified, AlphaCode 208:13 - 1st Caveat09:03 - Compositionality?10:16 - SimpleBench?13:11 - ARC-AGI, Chollet

Dec 21, 2024

22m

9

Never Browse Alone? - Gemini 2 Live and ChatGPT Vision

The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 00:00 - Introduction00:38 - Live Interaction 03:43 - Gemini 2.0 Flash Benchmarks 05:10 - Audio and Image Output06:38 - Project Mariner (+ WebVoyager Bench)08:49 - But Progress Slowing Down?10:43 - OpenAI Announcements + Gameshttps://aistudio.google.com/liveGemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/Project mariner: https://deepmind.google/technologies/project-mariner/WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGscAdvanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQhttps://simple-bench.com/Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-useOriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s

Dec 12, 2024

13m

8

Sora is Out, But is it a Distraction?

After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises. 80,000 hours Website, Podcast + Channel: https://80000hours.org/https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videoshttps://openai.com/sora/Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countriesSora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faqhttps://runwayml.com/ and https://pika.art/home DeepMind Veo: https://deepmind.google/technologies/veo/Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resortBut OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihqOpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE

Dec 10, 2024

15m

7

o1 Pro Mode – Full Analysis (plus o1 paper highlights)

Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. Weights and Biases’ Weave: wandb.me/ai_explainedPlus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdfApollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluationsAltman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/Tibor Blaho: https://x.com/btibor91/status/1864709670470066605Simple-bench.com 00:00 - Introduction00:27 - ChatGPT Pro is $20001:25 - OpenAI Benchmarks03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview06:18 - Simple Bench surprising results on sample08:31 - Weight & Biases09:05 - Image Analysis Compared12:51 - More Benchmarks and Safety

Dec 5, 2024

16m