All Episodes
LessWrong (30+ Karma) — 296 episodes
“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice
[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley
“The primary sources of near-term cybersecurity risk” by lc
“Most “inner work” looks like entertainment.” by Chris Lakin
[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn
“Voters are surprisingly open to talking about AI risk” by less_raichu
“Childhood and Education #18: Do The Math” by Zvi
“The Owned Ones” by Eliezer Yudkowsky
“Optimisation: Selective versus Predictive” by Raymond Douglas
“AI companies are already profitable (in the way that matters)” by Yair Halberstadt
“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel
“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes
“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff
“Who Got Breasts First and How We Got Them” by rba
“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen
“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov
“Sawtooth Problems” by Alexander Slugworth
“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied
“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine
“Neural Networks learn Bloom Filters” by Alex Gibson
“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper
“Why You Can’t Use Your Right to Try” by Stephen Martin
“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke
“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch
“Write Cause You Have Something to Say” by Logan Riggs
“AI is Breaking Two Vulnerability Cultures” by jefftk
“Is ProgramBench Impossible?” by frmsaul
“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa
[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston
“AI #167: The Prior Restraint Era Begins” by Zvi
“Mechanistic estimation for wide random MLPs” by Jacob_Hilton
“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks
“Try, even if they have you cold” by WalterL
“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck
“There is no evidence you should reapply sunscreen every 2 hours.” by Hide
“Many individual CEVs are probably quite bad” by Viliam
“x-risk-themed” by kave
“What is Anthropic?” by Zvi
“What if LLMs are mostly crystallized intelligence?” by deep
“Your rights when flying to Europe” by Yair Halberstadt
“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov
“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi
“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd
“Are you looking up?” by Craig Green
[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey
“Housing Roundup #15: The War Against Renters” by Zvi
“It’s nice of you to worry about me, but I really do have a life” by Viliam
“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky
“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder
“Taking woo seriously but not literally” by Kaj_Sotala
“Dairy cows make their misery expensive (but their calves can’t)” by Elizabeth
“Measuring the ability of Opus 4.5 to fool narrow classifiers” by Fabien Roger, John Hughes
“A new rationalist self-improvement book: the 12 Levers” by spencerg
“OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël
“You Are Not Immune To Mode Collapse” by J Bostock
“Primary Care Physicians are Incompetent. We Need More of Them.” by Hide
“How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez
“How much should the ideal person cry wolf?” by KatjaGrace
“Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans
“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen
“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC
“AI unemployment and AI extinction are often the same” by KatjaGrace
“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace
“Cyborg evals” by Eye You, frmsaul
“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen
“Research Sabotage in ML Codebases” by egan
“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC
“Notes on Transformer Consciousness” by slavachalnev
“On today’s panel with Bernie Sanders” by David Scott Krueger
“No Strong Orthogonality From Selection Pressure” by lumpenspace
“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob
“The Most Important Charts In The World” by Zvi
“LLM Style Slop is Absolutely Everywhere” by silentbob
“Goblin Mode, 24 Hours Later” by Dylan Bowman
“Let Kids Keep More Productivity Gains” by jefftk
“llm assistant personas seem increasingly incoherent (some subjective observations)” by nostalgebraist
“Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”” by LawrenceC
“The Problem in the “Nerd Sniping” xkcd Comic” by peralice
“Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers” by Jozdien, Alex Mallen
“Contra Binder on far-UVC and filtration” by jefftk
“Takes from two months as an aspiring LLM naturalist” by AnnaSalamon
“Forecasting is Not Overrated and It’s Probably Funded Appropriately” by Ben S.
“On the political feasibility of stopping AI” by David Scott Krueger
“Sleeper Agent Backdoor Results Are Messy” by Sebastian Prasanna, Alek Westover, Dylan Xu, Vivek Hebbar, Julian Stastny
“LessWrong Shows You Social Signals Before the Comment” by TurnTrout
“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen
“Curious cases of financial engineering in biotech” by Abhishaike Mahajan
“Update on the Alex Bores campaign” by Eric Neyman
“AI companies should publish security assessments” by ryan_greenblatt
“In defense of parents” by Yair Halberstadt
“The other paper that killed deep learning theory” by LawrenceC
“What holds AI safety together? Co-authorship networks from 200 papers” by Anna Thieser
″“Bad faith” means intentionally misrepresenting your beliefs” by TFD
“Retrospective on my unsupervised elicitation challenge” by DanielFilan
“Control protocols don’t always need to know which models are scheming” by Fabien Roger
“Anthropic spent too much don’t-be-annoying capital on Mythos” by draganover
“The paper that killed deep learning theory” by LawrenceC
“Forecasting is Way Overrated, and We Should Stop Funding It” by mabramov
″“Thinkhaven”” by Raemon
“Is the Cat Out of the Bag?: Who knows how to make AGI?” by Oliver Sourbut
“Against the “Permanent” Underclass” by Marcus Plutowski
“Quick Paper Review: “There Will Be a Scientific Theory of Deep Learning”” by LawrenceC
“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID
“Methodology for inferring propensities of LLMs” by Olli Järviniemi
“vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models” by Alan Cooney, Sid Black
“What Happens When a Model Thinks It Is AGI?” by josh :), David Africa
“Should We Train Against (CoT) Monitors?” by RohanS
“If Everyone Reads It, Nobody Dies - Course Launch” by Luc Brinkman, Chris-Lons
“Does your AI perform badly because you — you, specifically — are a bad person” by Natalie Cargill
“A “Lay” Introduction to “On the Complexity of Neural Computation in Superposition”” by LawrenceC
“An Angry Review of Greg Egan’s “Didicosm”” by LawrenceC
“Evil is bad, actually (Vassar and Olivia Schaefer)” by plex
“Your Supplies Probably Won’t Be Stolen in a Disaster” by jefftk
“Community misconduct disputes are not about facts” by mingyuan
“Why no new notations since 1960?” by Carl Feynman
“Narrow Secret Loyalty Dodges Black-Box Audits” by Alfie Lamerton, Fabien Roger
“10 posts I don’t have time to write” by habryka
“A taxonomy of barriers to trading with early misaligned AIs” by Alexa Pan
″$50 million a year for a 10% chance to ban ASI” by Andrea_Miotti, Alex Amadori, Gabriel Alfour
“Automated Deanonymization is Here” by jefftk
“Evil is bad, actually (Vassar and Olivia Schaefer callout post)” by plex
“10 non-boring ways I’ve used AI in the last month” by habryka
“Introducing LinuxArena” by Tyler Tracy, Ram Potham, Nick Kuhn, Myles H
“The “Budgeting” Skill Has The Most Betweenness Centrality (Probably)” by JenniferRM
“Finetuning Borges” by Linch
“9 kinds of hard-to-verify tasks” by Cleo Nardo
“How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?” by dx26, Alek Westover, Vivek Hebbar, Sebastian Prasanna, Buck, Julian Stastny
“Automating philosophy if Timothy Williamson is correct” by Cleo Nardo
“CLR’s Safe Pareto Improvements Research Agenda” by Anthony DiGiovanni
“LLMs are about to disrupt algorithmic media feeds” by lsusr
“Resources for starting and growing an AI safety org” by Bryce Robertson, Søren Elverlin, Melissa Samworth, jakkdl
“Quality Matters Most When Stakes are Highest” by LawrenceC
“Feel like a room has bad vibes? The lighting is probably too “spiky” or too blue” by habryka
“I did a jhana meditation retreat (in 2024) with Jhourney and it was okay.” by Jules
“R1 CoT illegibility revisited” by nostalgebraist
“Reevaluating AGI Ruin in 2026” by lc
“If It’s Worth Arguing, It’s Worth Arguing With Whiteboards” by Drake Morrison
“There are only four skills: design, technical, management and physical” by habryka
“Having OCD is like living in North Korea (Here’s how I escaped)” by Declan Molony
“Claude knows who you are” by Smaug123
“Vladimir Putin’s CEV is probably pretty good” by habryka
“Post-mortem’ing my earliest ML research paper, 7 years later” by LawrenceC
“If You’ve Never Bought a Tool You Didn’t Need, You’re Not Buying Enough Tools” by Drake Morrison
“3” by AnnaJo
“Consent-Based RL: Letting Models Endorse Their Own Training Updates” by Logan Riggs
“Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability” by Elle Najt, Asa Cooper Stickland, Xander Davies
“Let goodness conquer all that it can defend” by habryka
“Specialization is a Driver of Natural Ontology” by johnswentworth
[Linkpost] “You can only build safe ASI if ASI is globally banned” by Connor Leahy
“Beware of Well-Written Posts” by alseph
“You Aren’t in Charge of the Overton Window; Politics Is Not Interior Design” by Davidmanheim
“Carpathia Day” by Drake Morrison
“Do not conquer what you cannot defend” by habryka
“What is the Iliad Intensive?” by Leon Lang, Alexander Gietelink Oldenziel, David Udell
“The Mirror Test Is Complicated” by J Bostock
“Contra Leicht on AI Pauses” by David Scott Krueger (formerly: capybaralet)
“Nectome: All That I Know” by Raelifin
“Effective Altruism, Seen From Slytherin” by Xylix
“Majority Report” by peralice
“Current AIs seem pretty misaligned to me” by ryan_greenblatt
“Contra Byrnes on UV & Cancer” by HedonicEscalator
“Everyone Has a Plan Until They Get Social Pressure To the Face” by Czynski
“Mechanisms of Introspective Awareness” by Uzay Macar
“Load-Bearing Sincerity: On the Motive Reinforcement Thesis” by Fiora Starlight
“Diary of a “Doomer”: 12+ years arguing about AI risk (part 1)” by David Scott Krueger (formerly: capybaralet)
“A Retrospective of Richard Ngo’s 2022 List of Conceptual Alignment Projects” by LawrenceC
“From personas to intentions: towards a science of motivations for AI models” by David Africa, Jacob Pfau
“The Shapley Share of Responsibility?” by Raemon
“Who Killed Common Law?” by Benquo
“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, ryan_greenblatt
“Meaningful Questions Have Return Types” by Drake Morrison
“Only Law Can Prevent Extinction” by Eliezer Yudkowsky
“AI Safety’s Biggest Talent Gap Isn’t Researchers. It’s Generalists.” by Topaz, agucova, Alexandra Bates, Parv Mahajan
“Tomas Bjartur: The Last Prodigy” by Linch
“Annoyingly Principled People, and what befalls them” by Raemon
“TAPs or it didn’t happen” by Raemon
“Returns to intelligence” by RobertM
“Daycare illnesses” by Nina Panickssery
“The policy surrounding Mythos marks an irreversible power shift” by sil
“Talk English, Think Something Else” by J Bostock
“Sparse Autoencoders for Single-Cell Models” by Ihor Kendiukhov
“Eggs, rooms, puzzles, and talking about AI” by KatjaGrace
“Morale” by J Bostock
“Your Mom is a Chimera” by michaelwaves
“The Blast Radius Principle” by Martin Sustrik
“How to make good tea” by RobertM
“Catching illicit distributed training operations during an AI pause” by Robi Rahman
[Linkpost] “Scott Alexander gentrified my meetup” by dominicq
“Pausing AI Is the Best Answer to Post-Alignment Problems” by MichaelDickens
“Some thoughts on Nectome’s risk and resilience” by Aurelia
“Chocolate Sloths, Tinder, and Moral Backstops” by J Bostock
“Dario probably doesn’t believe in superintelligence” by RobertM
“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn
“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt
“Why Control Creates Conflict, and When to Open Instead” by plex
“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom
“Have we already lost? Part 2: Reasons for Doom” by LawrenceC
“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny
“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM
“Some takes on UV & cancer” by Steven Byrnes
“Help me launch Obsolete: a book aimed at building a new movement for AI reform” by garrison
“Slightly-Super Persuasion Will Do” by Tomás B.
“Have we already lost? Part 1: The Plan in 2024” by LawrenceC
“Do not be surprised if LessWrong gets hacked” by RobertM
“One Week in the Rat Farm” by Philip Harker
“101 Humans of New York on the Risks of AI” by Corm
“Baking tips” by RobertM
“An easy coordination problem?” by KatjaGrace
“Excerpts and Notes on Mythos Model Card” by williawa
“The effects of caffeine consumption do not decay with a ~5 hour half-life” by kman
“You don’t know what you are made of till you’ve been stalked across three countries” by Shoshannah Tekofsky
“Why is Flesh So Weak?” by J Bostock
“The hard part isn’t noticing when papers are bad, it’s deciding what to do afterwards” by LawrenceC
“We can prevent progress! Conceptual clarity, and inspiration from the FDA” by KatjaGrace
“AI as a Trojan horse race” by KatjaGrace
“My unsupervised elicitation challenge” by DanielFilan
“Role-playing vs Self-modelling” by Jan_Kulveit
“Elementary Condensation” by Jan
“Hedging and Survival-Weighted Planning” by Vaniver
“Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers” by Elle Najt
“An Alignment Journal: Features and policies” by JessRiedel, Dan MacKinlay, Luca, Daniel Murfet, david reinstein
“Fantasy ideology” by Ninety-Three
[Linkpost] “Questions raised about OpenAI leaders’ trustworthiness by the New Yorker” by Remmelt
“Claude Mythos System Card Preview” by anaguma
“My picture of the present in AI” by ryan_greenblatt
[Linkpost] ”[Paper] Stringological sequence prediction I” by Vanessa Kosoy
“We’re actually running out of benchmarks to upper bound AI capabilities” by LawrenceC
“Don’t write for LLMs, just record everything” by RobertM
“Contra Nina Panickssery on advice for children” by Sean Herrington
“By Strong Default, ASI Will End Liberal Democracy” by MichaelDickens
“AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines” by ryan_greenblatt
“Paper close reading: “Why Language Models Hallucinate”” by LawrenceC
“Ten different ways of thinking about Gradual Disempowerment” by David Scott Krueger (formerly: capybaralet)
“11 pieces of advice for children” by Nina Panickssery
“Steering Might Stop Working Soon” by J Bostock
“Am I the baddie?” by Ustice
“Academic Proof-of-Work in the Age of LLMs” by LawrenceC
“Positive sum does not mean “win-win”” by loops
“Considerations for growing the pie” by Zach Stein-Perlman
″“Following the incentives”” by David Scott Krueger (formerly: capybaralet)
“Chicken-Free Egg Whites” by jefftk
“dark ilan” by ozymandias
“Mean field sequence: an introduction” by Dmitry Vaintrob, Lauren Greenspan
“Democracy Dies With The Rifleman” by Vaniver
“The bar is lower than you think” by XelaP
“Did Anyone Predict the Industrial Revolution?” by Lost Futures
“Why do I believe preserving structure is enough?” by Aurelia
“There should be $100M grants to automate AI safety” by Marius Hobbhahn
“Sadly, The Whispering Earring” by Dentosal
“Common research advice #2: say precisely what you want to say” by LawrenceC
“2026: The year of throwing my agency at my health (now with added cyborgism)” by Ruby
[Linkpost] “Q1 2026 Timelines Update” by Daniel Kokotajlo, elifland, bhalstead
“How social ideas get corrupt” by Kaj_Sotala
“The Indestructible Future” by WillPetillo
“My most common advice for junior researchers” by LawrenceC
“The Practical Guide to Superbabies” by GeneSmith
“The Corner-Stone” by Benquo
“Systematically dismantle the AI compute supply chain.” by David Scott Krueger (formerly: capybaralet)
“The quest for general intelligence is hitting a wall” by Sean Herrington
“Intelligence Dissolves Privacy” by Vaniver
“Anthropic’s Pause is the Most Expensive Alarm in Corporate History” by Ruby
“I’m Suing Anthropic for Unauthorized Use of My Personality” by Linch
“Orders of magnitude: use semitones, not decibels” by Oliver Sourbut
“Dying with Whimsy” by NickyP
“AI for AI for Epistemics” by owencb, Lukas Finnveden
“Announcing Doublehaven with Reflections on Humour” by J Bostock
“Save the Sun Shrimp!” by Jack
“LIMBO: Who We Are, What We Do, and an Exciting High-Impact Funding Opportunity” by faul_sname
“Chat, is this sus?” by Tyler Tracy
″“You Have Not Been a Good User” (LessWrong’s second album)” by habryka
“Lesswrong Liberated” by Ronny Fernandez
“The Claude Code Source Leak” by Error
“Experiments With Opus 4.6’s Fiction” by Tomás B.
“Product Alignment is not Superintelligence Alignment (and we need the latter to survive)” by plex
“Co-Found Lens Academy With Me. (We have early users and funding)” by Luc Brinkman
“Slack in Cells, Slack in Brains” by Mateusz Bagiński
“I am definitely missing the pre-AI writing era” by N. Cailie
“The state of AI safety in four fake graphs” by Boaz Barak
“AI should be a good citizen, not just a good assistant” by Tom Davidson, wdmacaskill
″(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL” by 7vik, Sid Black, Joseph Bloom
[Linkpost] “Parkinson’s Law of Worry” by Jakub Halmeš
“Folie à Machine: LLMs and Epistemic Capture” by DaystarEld
“Stop asking “how good is this” to decide between donation opportunities I recommend” by Zach Stein-Perlman
“Nick Bostrom: How big is the cosmic endowment?” by Zach Stein-Perlman
“Don’t Overdose Locally Beneficial Changes” by Mateusz Bagiński
“Stanley Milgram wasn’t pessimistic enough about human nature?” by David Gross
[Linkpost] “What if superintelligence is just weak?” by Simon Lermen
“Pray for Casanova” by Tomás B.
“ControlAI 2025 Impact Report” by Andrea_Miotti, Alex Amadori
“AI’s capability improvements haven’t come from it getting less affordable” by Anders Woodruff
“Scaffolded Reproducers, Scaffolded Agents” by Mateusz Bagiński
“My hobby: running deranged surveys” by leogao
“The Terrarium” by Caleb Biddulph
“Sen. Sanders (I-VT) and Rep. Ocasio-Cortez (D-NY) propose AI Data Center Moratorium Act” by Matrice Jacobine
“Test your best methods on our hard CoT interp tasks” by daria, Riya Tyagi, Josh Engels, Neel Nanda
″“What Exactly Would An International AI Treaty Say?” Is a Bad Objection” by Davidmanheim