LessWrong (30+ Karma) cover art

All Episodes

LessWrong (30+ Karma) — 296 episodes

#
Title
1

“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice

2

[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley

3

“The primary sources of near-term cybersecurity risk” by lc

4

“Most “inner work” looks like entertainment.” by Chris Lakin

5

[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn

6

“Voters are surprisingly open to talking about AI risk” by less_raichu

7

“Childhood and Education #18: Do The Math” by Zvi

8

“The Owned Ones” by Eliezer Yudkowsky

9

“Optimisation: Selective versus Predictive” by Raymond Douglas

10

“AI companies are already profitable (in the way that matters)” by Yair Halberstadt

11

“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel

12

“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes

13

“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff

14

“Who Got Breasts First and How We Got Them” by rba

15

“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen

16

“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov

17

“Sawtooth Problems” by Alexander Slugworth

18

“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied

19

“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine

20

“Neural Networks learn Bloom Filters” by Alex Gibson

21

“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper

22

“Why You Can’t Use Your Right to Try” by Stephen Martin

23

“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke

24

“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch

25

“Write Cause You Have Something to Say” by Logan Riggs

26

“AI is Breaking Two Vulnerability Cultures” by jefftk

27

“Is ProgramBench Impossible?” by frmsaul

28

“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa

29

[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston

30

“AI #167: The Prior Restraint Era Begins” by Zvi

31

“Mechanistic estimation for wide random MLPs” by Jacob_Hilton

32

“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

33

“Try, even if they have you cold” by WalterL

34

“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck

35

“There is no evidence you should reapply sunscreen every 2 hours.” by Hide

36

“Many individual CEVs are probably quite bad” by Viliam

37

“x-risk-themed” by kave

38

“What is Anthropic?” by Zvi

39

“What if LLMs are mostly crystallized intelligence?” by deep

40

“Your rights when flying to Europe” by Yair Halberstadt

41

“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov

42

“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi

43

“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd

44

“Are you looking up?” by Craig Green

45

[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

46

“Housing Roundup #15: The War Against Renters” by Zvi

47

“It’s nice of you to worry about me, but I really do have a life” by Viliam

48

“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky

49

“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder

50

“Taking woo seriously but not literally” by Kaj_Sotala

51

“Dairy cows make their misery expensive (but their calves can’t)” by Elizabeth

52

“Measuring the ability of Opus 4.5 to fool narrow classifiers” by Fabien Roger, John Hughes

53

“A new rationalist self-improvement book: the 12 Levers” by spencerg

54

“OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël

55

“You Are Not Immune To Mode Collapse” by J Bostock

56

“Primary Care Physicians are Incompetent. We Need More of Them.” by Hide

57

“How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez

58

“How much should the ideal person cry wolf?” by KatjaGrace

59

“Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans

60

“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen

61

“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC

62

“AI unemployment and AI extinction are often the same” by KatjaGrace

63

“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace

64

“Cyborg evals” by Eye You, frmsaul

65

“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen

66

“Research Sabotage in ML Codebases” by egan

67

“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC

68

“Notes on Transformer Consciousness” by slavachalnev

69

“On today’s panel with Bernie Sanders” by David Scott Krueger

70

“No Strong Orthogonality From Selection Pressure” by lumpenspace

71

“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob

72

“The Most Important Charts In The World” by Zvi

73

“LLM Style Slop is Absolutely Everywhere” by silentbob

74

“Goblin Mode, 24 Hours Later” by Dylan Bowman

75

“Let Kids Keep More Productivity Gains” by jefftk

76

“llm assistant personas seem increasingly incoherent (some subjective observations)” by nostalgebraist

77

“Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”” by LawrenceC

78

“The Problem in the “Nerd Sniping” xkcd Comic” by peralice

79

“Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers” by Jozdien, Alex Mallen

80

“Contra Binder on far-UVC and filtration” by jefftk

81

“Takes from two months as an aspiring LLM naturalist” by AnnaSalamon

82

“Forecasting is Not Overrated and It’s Probably Funded Appropriately” by Ben S.

83

“On the political feasibility of stopping AI” by David Scott Krueger

84

“Sleeper Agent Backdoor Results Are Messy” by Sebastian Prasanna, Alek Westover, Dylan Xu, Vivek Hebbar, Julian Stastny

85

“LessWrong Shows You Social Signals Before the Comment” by TurnTrout

86

“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen

87

“Curious cases of financial engineering in biotech” by Abhishaike Mahajan

88

“Update on the Alex Bores campaign” by Eric Neyman

89

“AI companies should publish security assessments” by ryan_greenblatt

90

“In defense of parents” by Yair Halberstadt

91

“The other paper that killed deep learning theory” by LawrenceC

92

“What holds AI safety together? Co-authorship networks from 200 papers” by Anna Thieser

93

″“Bad faith” means intentionally misrepresenting your beliefs” by TFD

94

“Retrospective on my unsupervised elicitation challenge” by DanielFilan

95

“Control protocols don’t always need to know which models are scheming” by Fabien Roger

96

“Anthropic spent too much don’t-be-annoying capital on Mythos” by draganover

97

“The paper that killed deep learning theory” by LawrenceC

98

“Forecasting is Way Overrated, and We Should Stop Funding It” by mabramov

99

″“Thinkhaven”” by Raemon

100

“Is the Cat Out of the Bag?: Who knows how to make AGI?” by Oliver Sourbut

101

“Against the “Permanent” Underclass” by Marcus Plutowski

102

“Quick Paper Review: “There Will Be a Scientific Theory of Deep Learning”” by LawrenceC

103

“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID

104

“Methodology for inferring propensities of LLMs” by Olli Järviniemi

105

“vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models” by Alan Cooney, Sid Black

106

“What Happens When a Model Thinks It Is AGI?” by josh :), David Africa

107

“Should We Train Against (CoT) Monitors?” by RohanS

108

“If Everyone Reads It, Nobody Dies - Course Launch” by Luc Brinkman, Chris-Lons

109

“Does your AI perform badly because you — you, specifically — are a bad person” by Natalie Cargill

110

“A “Lay” Introduction to “On the Complexity of Neural Computation in Superposition”” by LawrenceC

111

“An Angry Review of Greg Egan’s “Didicosm”” by LawrenceC

112

“Evil is bad, actually (Vassar and Olivia Schaefer)” by plex

113

“Your Supplies Probably Won’t Be Stolen in a Disaster” by jefftk

114

“Community misconduct disputes are not about facts” by mingyuan

115

“Why no new notations since 1960?” by Carl Feynman

116

“Narrow Secret Loyalty Dodges Black-Box Audits” by Alfie Lamerton, Fabien Roger

117

“10 posts I don’t have time to write” by habryka

118

“A taxonomy of barriers to trading with early misaligned AIs” by Alexa Pan

119

″$50 million a year for a 10% chance to ban ASI” by Andrea_Miotti, Alex Amadori, Gabriel Alfour

120

“Automated Deanonymization is Here” by jefftk

121

“Evil is bad, actually (Vassar and Olivia Schaefer callout post)” by plex

122

“10 non-boring ways I’ve used AI in the last month” by habryka

123

“Introducing LinuxArena” by Tyler Tracy, Ram Potham, Nick Kuhn, Myles H

124

“The “Budgeting” Skill Has The Most Betweenness Centrality (Probably)” by JenniferRM

125

“Finetuning Borges” by Linch

126

“9 kinds of hard-to-verify tasks” by Cleo Nardo

127

“How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?” by dx26, Alek Westover, Vivek Hebbar, Sebastian Prasanna, Buck, Julian Stastny

128

“Automating philosophy if Timothy Williamson is correct” by Cleo Nardo

129

“CLR’s Safe Pareto Improvements Research Agenda” by Anthony DiGiovanni

130

“LLMs are about to disrupt algorithmic media feeds” by lsusr

131

“Resources for starting and growing an AI safety org” by Bryce Robertson, Søren Elverlin, Melissa Samworth, jakkdl

132

“Quality Matters Most When Stakes are Highest” by LawrenceC

133

“Feel like a room has bad vibes? The lighting is probably too “spiky” or too blue” by habryka

134

“I did a jhana meditation retreat (in 2024) with Jhourney and it was okay.” by Jules

135

“R1 CoT illegibility revisited” by nostalgebraist

136

“Reevaluating AGI Ruin in 2026” by lc

137

“If It’s Worth Arguing, It’s Worth Arguing With Whiteboards” by Drake Morrison

138

“There are only four skills: design, technical, management and physical” by habryka

139

“Having OCD is like living in North Korea (Here’s how I escaped)” by Declan Molony

140

“Claude knows who you are” by Smaug123

141

“Vladimir Putin’s CEV is probably pretty good” by habryka

142

“Post-mortem’ing my earliest ML research paper, 7 years later” by LawrenceC

143

“If You’ve Never Bought a Tool You Didn’t Need, You’re Not Buying Enough Tools” by Drake Morrison

144

“3” by AnnaJo

145

“Consent-Based RL: Letting Models Endorse Their Own Training Updates” by Logan Riggs

146

“Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability” by Elle Najt, Asa Cooper Stickland, Xander Davies

147

“Let goodness conquer all that it can defend” by habryka

148

“Specialization is a Driver of Natural Ontology” by johnswentworth

149

[Linkpost] “You can only build safe ASI if ASI is globally banned” by Connor Leahy

150

“Beware of Well-Written Posts” by alseph

151

“You Aren’t in Charge of the Overton Window; Politics Is Not Interior Design” by Davidmanheim

152

“Carpathia Day” by Drake Morrison

153

“Do not conquer what you cannot defend” by habryka

154

“What is the Iliad Intensive?” by Leon Lang, Alexander Gietelink Oldenziel, David Udell

155

“The Mirror Test Is Complicated” by J Bostock

156

“Contra Leicht on AI Pauses” by David Scott Krueger (formerly: capybaralet)

157

“Nectome: All That I Know” by Raelifin

158

“Effective Altruism, Seen From Slytherin” by Xylix

159

“Majority Report” by peralice

160

“Current AIs seem pretty misaligned to me” by ryan_greenblatt

161

“Contra Byrnes on UV & Cancer” by HedonicEscalator

162

“Everyone Has a Plan Until They Get Social Pressure To the Face” by Czynski

163

“Mechanisms of Introspective Awareness” by Uzay Macar

164

“Load-Bearing Sincerity: On the Motive Reinforcement Thesis” by Fiora Starlight

165

“Diary of a “Doomer”: 12+ years arguing about AI risk (part 1)” by David Scott Krueger (formerly: capybaralet)

166

“A Retrospective of Richard Ngo’s 2022 List of Conceptual Alignment Projects” by LawrenceC

167

“From personas to intentions: towards a science of motivations for AI models” by David Africa, Jacob Pfau

168

“The Shapley Share of Responsibility?” by Raemon

169

“Who Killed Common Law?” by Benquo

170

“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, ryan_greenblatt

171

“Meaningful Questions Have Return Types” by Drake Morrison

172

“Only Law Can Prevent Extinction” by Eliezer Yudkowsky

173

“AI Safety’s Biggest Talent Gap Isn’t Researchers. It’s Generalists.” by Topaz, agucova, Alexandra Bates, Parv Mahajan

174

“Tomas Bjartur: The Last Prodigy” by Linch

175

“Annoyingly Principled People, and what befalls them” by Raemon

176

“TAPs or it didn’t happen” by Raemon

177

“Returns to intelligence” by RobertM

178

“Daycare illnesses” by Nina Panickssery

179

“The policy surrounding Mythos marks an irreversible power shift” by sil

180

“Talk English, Think Something Else” by J Bostock

181

“Sparse Autoencoders for Single-Cell Models” by Ihor Kendiukhov

182

“Eggs, rooms, puzzles, and talking about AI” by KatjaGrace

183

“Morale” by J Bostock

184

“Your Mom is a Chimera” by michaelwaves

185

“The Blast Radius Principle” by Martin Sustrik

186

“How to make good tea” by RobertM

187

“Catching illicit distributed training operations during an AI pause” by Robi Rahman

188

[Linkpost] “Scott Alexander gentrified my meetup” by dominicq

189

“Pausing AI Is the Best Answer to Post-Alignment Problems” by MichaelDickens

190

“Some thoughts on Nectome’s risk and resilience” by Aurelia

191

“Chocolate Sloths, Tinder, and Moral Backstops” by J Bostock

192

“Dario probably doesn’t believe in superintelligence” by RobertM

193

“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn

194

“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt

195

“Why Control Creates Conflict, and When to Open Instead” by plex

196

“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom

197

“Have we already lost? Part 2: Reasons for Doom” by LawrenceC

198

“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny

199

“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM

200

“Some takes on UV & cancer” by Steven Byrnes

201

“Help me launch Obsolete: a book aimed at building a new movement for AI reform” by garrison

202

“Slightly-Super Persuasion Will Do” by Tomás B.

203

“Have we already lost? Part 1: The Plan in 2024” by LawrenceC

204

“Do not be surprised if LessWrong gets hacked” by RobertM

205

“One Week in the Rat Farm” by Philip Harker

206

“101 Humans of New York on the Risks of AI” by Corm

207

“Baking tips” by RobertM

208

“An easy coordination problem?” by KatjaGrace

209

“Excerpts and Notes on Mythos Model Card” by williawa

210

“The effects of caffeine consumption do not decay with a ~5 hour half-life” by kman

211

“You don’t know what you are made of till you’ve been stalked across three countries” by Shoshannah Tekofsky

212

“Why is Flesh So Weak?” by J Bostock

213

“The hard part isn’t noticing when papers are bad, it’s deciding what to do afterwards” by LawrenceC

214

“We can prevent progress! Conceptual clarity, and inspiration from the FDA” by KatjaGrace

215

“AI as a Trojan horse race” by KatjaGrace

216

“My unsupervised elicitation challenge” by DanielFilan

217

“Role-playing vs Self-modelling” by Jan_Kulveit

218

“Elementary Condensation” by Jan

219

“Hedging and Survival-Weighted Planning” by Vaniver

220

“Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers” by Elle Najt

221

“An Alignment Journal: Features and policies” by JessRiedel, Dan MacKinlay, Luca, Daniel Murfet, david reinstein

222

“Fantasy ideology” by Ninety-Three

223

[Linkpost] “Questions raised about OpenAI leaders’ trustworthiness by the New Yorker” by Remmelt

224

“Claude Mythos System Card Preview” by anaguma

225

“My picture of the present in AI” by ryan_greenblatt

226

[Linkpost] ”[Paper] Stringological sequence prediction I” by Vanessa Kosoy

227

“We’re actually running out of benchmarks to upper bound AI capabilities” by LawrenceC

228

“Don’t write for LLMs, just record everything” by RobertM

229

“Contra Nina Panickssery on advice for children” by Sean Herrington

230

“By Strong Default, ASI Will End Liberal Democracy” by MichaelDickens

231

“AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines” by ryan_greenblatt

232

“Paper close reading: “Why Language Models Hallucinate”” by LawrenceC

233

“Ten different ways of thinking about Gradual Disempowerment” by David Scott Krueger (formerly: capybaralet)

234

“11 pieces of advice for children” by Nina Panickssery

235

“Steering Might Stop Working Soon” by J Bostock

236

“Am I the baddie?” by Ustice

237

“Academic Proof-of-Work in the Age of LLMs” by LawrenceC

238

“Positive sum does not mean “win-win”” by loops

239

“Considerations for growing the pie” by Zach Stein-Perlman

240

″“Following the incentives”” by David Scott Krueger (formerly: capybaralet)

241

“Chicken-Free Egg Whites” by jefftk

242

“dark ilan” by ozymandias

243

“Mean field sequence: an introduction” by Dmitry Vaintrob, Lauren Greenspan

244

“Democracy Dies With The Rifleman” by Vaniver

245

“The bar is lower than you think” by XelaP

246

“Did Anyone Predict the Industrial Revolution?” by Lost Futures

247

“Why do I believe preserving structure is enough?” by Aurelia

248

“There should be $100M grants to automate AI safety” by Marius Hobbhahn

249

“Sadly, The Whispering Earring” by Dentosal

250

“Common research advice #2: say precisely what you want to say” by LawrenceC

251

“2026: The year of throwing my agency at my health (now with added cyborgism)” by Ruby

252

[Linkpost] “Q1 2026 Timelines Update” by Daniel Kokotajlo, elifland, bhalstead

253

“How social ideas get corrupt” by Kaj_Sotala

254

“The Indestructible Future” by WillPetillo

255

“My most common advice for junior researchers” by LawrenceC

256

“The Practical Guide to Superbabies” by GeneSmith

257

“The Corner-Stone” by Benquo

258

“Systematically dismantle the AI compute supply chain.” by David Scott Krueger (formerly: capybaralet)

259

“The quest for general intelligence is hitting a wall” by Sean Herrington

260

“Intelligence Dissolves Privacy” by Vaniver

261

“Anthropic’s Pause is the Most Expensive Alarm in Corporate History” by Ruby

262

“I’m Suing Anthropic for Unauthorized Use of My Personality” by Linch

263

“Orders of magnitude: use semitones, not decibels” by Oliver Sourbut

264

“Dying with Whimsy” by NickyP

265

“AI for AI for Epistemics” by owencb, Lukas Finnveden

266

“Announcing Doublehaven with Reflections on Humour” by J Bostock

267

“Save the Sun Shrimp!” by Jack

268

“LIMBO: Who We Are, What We Do, and an Exciting High-Impact Funding Opportunity” by faul_sname

269

“Chat, is this sus?” by Tyler Tracy

270

″“You Have Not Been a Good User” (LessWrong’s second album)” by habryka

271

“Lesswrong Liberated” by Ronny Fernandez

272

“The Claude Code Source Leak” by Error

273

“Experiments With Opus 4.6’s Fiction” by Tomás B.

274

“Product Alignment is not Superintelligence Alignment (and we need the latter to survive)” by plex

275

“Co-Found Lens Academy With Me. (We have early users and funding)” by Luc Brinkman

276

“Slack in Cells, Slack in Brains” by Mateusz Bagiński

277

“I am definitely missing the pre-AI writing era” by N. Cailie

278

“The state of AI safety in four fake graphs” by Boaz Barak

279

“AI should be a good citizen, not just a good assistant” by Tom Davidson, wdmacaskill

280

″(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL” by 7vik, Sid Black, Joseph Bloom

281

[Linkpost] “Parkinson’s Law of Worry” by Jakub Halmeš

282

“Folie à Machine: LLMs and Epistemic Capture” by DaystarEld

283

“Stop asking “how good is this” to decide between donation opportunities I recommend” by Zach Stein-Perlman

284

“Nick Bostrom: How big is the cosmic endowment?” by Zach Stein-Perlman

285

“Don’t Overdose Locally Beneficial Changes” by Mateusz Bagiński

286

“Stanley Milgram wasn’t pessimistic enough about human nature?” by David Gross

287

[Linkpost] “What if superintelligence is just weak?” by Simon Lermen

288

“Pray for Casanova” by Tomás B.

289

“ControlAI 2025 Impact Report” by Andrea_Miotti, Alex Amadori

290

“AI’s capability improvements haven’t come from it getting less affordable” by Anders Woodruff

291

“Scaffolded Reproducers, Scaffolded Agents” by Mateusz Bagiński

292

“My hobby: running deranged surveys” by leogao

293

“The Terrarium” by Caleb Biddulph

294

“Sen. Sanders (I-VT) and Rep. Ocasio-Cortez (D-NY) propose AI Data Center Moratorium Act” by Matrice Jacobine

295

“Test your best methods on our hard CoT interp tasks” by daria, Riya Tyagi, Josh Engels, Neel Nanda

296

″“What Exactly Would An International AI Treaty Say?” Is a Bad Objection” by Davidmanheim