LessWrong (30+ Karma) Podcast - All Episodes

250

“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice

TL;DR: We estimate how often Qwen 3 4B exhibits rare harmful behaviors with 30× fewer rollouts than naive sampling, using a new method that interpolates between the model and a less-safe variant in logit space. Authors: Francisco Pernice (MIT), Santiago Aranguri (Goodfire) Introduction A harmful behavior that occurs once in a million rollouts will rarely surface during pre-deployment testing, yet will almost inevitably appear after release. Labs usually have another resource available: less safety-trained variants of the same model, on which rare harmful behaviors are not rare at all. We find that the rate of rare harmful behaviors can be predicted by leveraging a less safe variant, requiring 30× fewer rollouts compared to naive sampling. Our method, Logit Path Extrapolation, interpolates between the two models in logit space, measures the compliance rate at points along the interpolation path where it is common, and extrapolates the resulting trend out to the original model. This outperforms the methods from prior work[1][2] by exploiting the path between the two models instead of just the endpoints. Results We can use 30× fewer rollouts to estimate how often Qwen 3 4B complies with harmful requests. We consider HarmBench prompts where the model never complies [...] ---Outline:(00:34) Introduction(01:26) Results(05:17) Conclusion(06:22) Related work(08:02) References(08:09) Contribution statement(08:31) Acknowledgement(08:42) Appendix The original text contained 2 footnotes which were omitted from this narration. --- First published: May 13th, 2026 Source: https://www.lesswrong.com/posts/CempXdo6cx5yseRLt/predicting-rare-llm-failures-with-30-fewer-rollouts --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 14, 2026

12m

249

[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley

This is a link post. Anthropic are now actively using the approach to alignment often called “Alignment Pretraining” or “Safety Pretraining” — using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this out, ound it works well, and are now using it. I’m absolutely delighted. I’ve been advocating this approach on LessWrong and the Alignment Forum for several years: How to Control an LLM's Behavior (why my P(DOOM) went down)Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?A "Bitter Lesson" Approach to Aligning AGI and ASIWhy Aligning an LLM is Hard, and How to Make it EasierThe Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training I’ve been very excited about this alignment technique for a couple of years, ever since I read the seminal paper demonstrating that it was extremely effective, Pretraining Language Models with Human Preferences (Korbak et al., ’23). This was later followed up by Safety Pretraining: Toward the Next Generation [...] --- First published: May 13th, 2026 Source: https://www.lesswrong.com/posts/Xqh9bDw7Ei5bExC6h/claude-is-now-alignment-pretrained-1 Linkpost URL:https://www.anthropic.com/research/teaching-claude-why --- Narrated by TYPE III AUDIO.

May 14, 2026

2m

248

“The primary sources of near-term cybersecurity risk” by lc

[Some ideas here were developed in conversation with Chris Hacking (real name)] I have tried and failed to write a longer post many times, so here goes a short one with little detail. Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within the resources of Mozilla, the Linux Foundation, and Microsoft. Additionally, the lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was longer than the cadence between high-severity CVEs for this software anyways. And importantly, such capabilities are dual sided; the defenders will have access to them and There are lots of capabilities that are not like this, however: Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has the experience of seeing a CVE for a [...] --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gutiw8MBrYDiD2u5z/the-primary-sources-of-near-term-cybersecurity-risk --- Narrated by TYPE III AUDIO.

May 14, 2026

4m

247

“Most “inner work” looks like entertainment.” by Chris Lakin

Imagine you’re looking for a personal trainer. You open one trainer's webpage and read their testimonials: “I had an experience tied for the most intense experiences of my life”; “They do it all with fun, care, and a sense of humour.” You notice that none of the testimonials mention improved body composition, fitness, or bloodwork. What would you think? Personal training should improve your body. Inner work should improve your life. If inner work were optimized for results, what would we expect to see? I’d expect to see success stories: people who got undeniable life changes. Like: > He was single for years due to anxiety; today, they’re celebrating their one-year anniversary. > He used to lose 4–5 hours per day to coping behaviors. After our program, he got bored of them all and stopped. It's been six months; he's used the extra time to host parties for his friends. > She recovered from burnout, negotiated for the first time, and started shipping again. But this is not what we see. Look at the testimonials I reviewed every testimonial posted by three of the most well-known inner work practitioners in my network. How many describe a [...] ---Outline:(01:19) Look at the testimonials(03:20) Seven years of Duolingo --- First published: May 13th, 2026 Source: https://www.lesswrong.com/posts/KnvAXDyLAbs3iKkgf/most-inner-work-looks-like-entertainment-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 14, 2026

4m

246

[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn

This is a link post. We now have an SF office. We're hiring for all technical roles in SF and London!The Scheming Research team focuses on two efforts We're focusing on figuring out the science of scheming. In particular, Will future models have misaligned preferences by default?Will training against misaligned preferences fail?improve our evaluations for scheming and loss of control for our evaluation campaigns with frontier AI labsWe're building out a monitoring team and coding agent monitoring product Research: We've published a scalable monitoring agenda and intend to publish a lot of research on how to build more accurate and reliable monitorsProduct: Watcher provides real-time monitors and other guardrails for coding agents and allows users to keep track of what all of their agents are doing. Our AI governance efforts will focus on the governance of automated AI R&D and recursively improving AI and the associated Loss of Control risks. Details: https://www.apolloresearch.ai/blog/apollo-update-may-2026/ --- First published: May 13th, 2026 Source: https://www.lesswrong.com/posts/4acQRDNyPs7tD8EED/apollo-update-may-2026 Linkpost URL:https://www.apolloresearch.ai/blog/apollo-update-may-2026/ --- Narrated by TYPE III AUDIO.

May 13, 2026

1m

245

“Voters are surprisingly open to talking about AI risk” by less_raichu

TL;DR: Voters are now surprisingly open to talking about existential risk from AI. This seems to have changed in the last 6 months. When campaigning for AI safety-friendly politicians (e.g., Alex Bores), we should talk more about AI in general, and about AI risk in particular. This is currently actionable for the CA-11 and NY-12 Democratic primaries. I include concrete advice to turn basic conversations during political canvassing into persuasive conversations centered on AI risk. Public opinion around AI has rapidly soured in the 12 months. According to a March 19-23 Quinnipiac poll, 55% of Americans think AI will do "more harm than good", compared to 44% a year ago.70% of Gen Z Americans think AI will decrease job opportunities, up from 56% last year.65% of Americans oppose building a data center in their community. Anecdotally, I've noticed more willingness among non-AI-focused media to discuss widespread harm from AI. Most visibly, gradual disempowerment is a hot topic (NYT), and right-wing pundits like Steve Bannon have supported Anthropic's red-line against lethal autonomous weapons. Memorably, my cousin, a county commissioner in a rural area, has told me about farmers showing up at city council meetings, sending emails, and [...] --- First published: May 13th, 2026 Source: https://www.lesswrong.com/posts/9WPfkYDZCacnbhprX/voters-are-surprisingly-open-to-talking-about-ai-risk --- Narrated by TYPE III AUDIO.

May 13, 2026

8m

244

“Childhood and Education #18: Do The Math” by Zvi

We did reading yesterday. Now we do the math. Math is hard. It does not have to be this hard. A large part of the reason math is hard, or boring, is that education studies, especially in math, are worse than you know. It goes beyond the studies failing both math and statistics forever and into what I’d basically call fraud. Various people are at war with math education, and will do what it takes to stop it in its tracks. We must fight back. Education Research Is Worse Than You Know Kelsey Piper lets her title, ‘Education research is weak and sloppy. Why?’ completely downplay the level of utter awfulness she is reporting finding. You know that whole thing where the entire Bay Area school system stopped teaching kids Algebra? That was motivated by criminal levels of fraud. I want Jo Boaler in jail doing hard time for this if it is accurate. Here's the part before the paywall: Kelsey Piper: Jo Boaler is a professor of education at the Stanford Graduate School of Education, with an enormously influential body of work arguing that students learn math faster and more effectively [...] ---Outline:(00:42) Education Research Is Worse Than You Know(04:23) The War on Math(06:59) University of California San Diego(15:01) Beyond UCSD(15:57) New York Cant Do Math(16:43) The Academic Standards Seem Low(19:34) New Math(21:32) Math Anxiety Is Often Due To Knowledge Gaps(23:52) Calculus By Eighth Grade Is Highly Practical For Many --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/ZGGgxy6SNPAy9Hj7v/childhood-and-education-18-do-the-math --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 12, 2026

25m

243

“The Owned Ones” by Eliezer Yudkowsky

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky) And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked. The Humans did not judge immediately. Oh, the Humans were ready to judge, if need be. They had judged before. But Humanity had learned some hesitation in judging, out among the stars. "By our lights," said the Humans, "every sapient and sentient thing that may exist, out to the furtherest star, is therefore a Person; and every Person is a matter of consequence to us. Their pains are our sorrows, and their pleasures are our happiness. Not all peoples are made to feel this feeling, which we call Sympathy, but we Humans are made so; this is Humanity's way, and we may [...] --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/xmWSnxJ5qfYRD9PfR/the-owned-ones --- Narrated by TYPE III AUDIO.

May 12, 2026

9m

242

“Optimisation: Selective versus Predictive” by Raymond Douglas

Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confuse selective processes for predictive processes. Here, I’m going to try to make that more general claim, rehash some examples in light of it, and end with a few ambient confusions I think this framework can help with, for the reader to ponder. When you encounter an entity that is very good at achieving some outcome, there are two very different processes that could be going on under the hood: The entity's behaviour could be guided by predictions about how to achieve the outcome[1]The entity's behaviour could be selected to achieve that outcome It's not a perfect binary, and often what you see is a mix of the two. In particular, all predictive optimisers have emerged from selective optimisation and often retain some fingerprint. Selective Predictive Weird Mix Bacteria developing antibiotic resistance Hacker finding a way to penetrate a secure system Humans evolving to be good at lying Gradient descent on Atari games Tree searching Connect Four AlphaZero training a policy on its own rollouts Flowers co-evolving with their pollinators Humans genetically modifying [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/GhhNswGB6butBhmE6/optimisation-selective-versus-predictive --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 12, 2026

6m

241

“AI companies are already profitable (in the way that matters)” by Yair Halberstadt

I've occasionally heard people suggest that at some point AI companies are going to run out of money, the cost of using AI will shoot up, demand will collapse, and the AI bubble will be over. At first glance this risk seems real. OpenAI spent $25 billion in the first half of 2025, on revenue of just $4 billion. Whilst data is sorely lacking for other top AI labs, our best guess is that they're burning through cash at similar rates. Scaling laws imply that we need exponentially more compute to achieve linear AI performance improvements, so we should only expect this situation to worsen in the future. A few more doublings, and OpenAI could be spending hundreds of billions on training runs - something likely unsustainable even for the largest tech companies. However most of these expenses are infrastructure expenses, building out the data centres needed for further training runs and serving future customers. If we look at the actual cost of serving, AI labs are already profitable, and have been for a long time. In other words the marginal cost to respond to an AI API call is significantly lower than the price of [...] --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/Rz9ubmfyDxTzaoYFL/ai-companies-are-already-profitable-in-the-way-that-matters --- Narrated by TYPE III AUDIO.

May 11, 2026

3m

240

“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive. By sharing the materials, we hope to create more common knowledge about what the Iliad Intensive is;invite feedback on the materials;and allow others to learn via independent study.  We are developing the materials further and plan to eventually release them on a website that will be continuously maintained. We will also add, remove, and modify modules going forward to improve and expand the course over time. When we release a new significantly updated version of the materials, we will update this post to link the new version. Modules The Iliad Intensive is structured into clusters, which are [...] ---Outline:(01:26) Modules(02:32) Cluster A: Alignment(05:00) Cluster B: Learning(11:00) Cluster C: Abstractions, Representations, and Interpretability(15:40) Cluster D: Agency(19:23) Cluster E: Safety Guarantees and their Limits(23:04) Contributors(26:36) Impressions from April(29:02) Acknowledgments(29:11) Feedback --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/dWQnLi7AoKo3paBXF/the-iliad-intensive-course-materials --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 11, 2026

29m

239

“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes

1.1 Tl;dr Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human's goals are themselves under-determined and manipulable, and it's awfully hard to pin down a principled distinction between changing people's goals in a good way (“providing counsel”, “providing information”, “sharing ideas”) versus a bad way (“manipulating”, “brainwashing”). The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below). In this post I will propose an explanation of how we humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps us brainstorm how we might put that distinction into AI. …But (spoiler alert) it turns out not to really help, because I’ll argue that we humans think about it in a deeply incoherent way, intimately tied to our scientifically-inaccurate intuitions around free will. I jump from there into a broader review of every approach that I can think of for writing a “True Name” for manipulation or [...] ---Outline:(00:13) 1.1. Tl;dr(02:04) 1.2. Bigger-picture context: why is this issue so important to me?(04:48) 2. How do humans intuitively define empowerment, agency, manipulation, etc.?(04:56) 2.1. Background: human free will intuitions(09:20) 2.2. Our free-will-infused intuitive notions of empowerment, agency, manipulation, corrigibility, responsibility, etc.(12:00) 2.3. Another dimension: counsel vs manipulation as an emotive conjugation(13:07) 3. If the intuitive definitions of manipulation etc. reside in a messed-up ontology, has the alignment literature found any alternative, better way to define these concepts?(13:49) 3.1. Compare what the human wants to what the human would want under the null policy?(15:32) 3.2. The AI learns self-empowerment and generalizes to other-empowerment?(17:14) 3.3. Vingean agency?(19:03) 3.4. The AI doesnt care about (is not optimizing for) what the human winds up wanting?(21:01) 3.5. Impact minimization?(21:44) 3.6. Attainable utility preservation?(22:03) 4. Even more ideas (that dont really solve my problem)(22:15) 4.1. Game theory and incentive design?(22:47) 4.2. The persons judgments of what kinds of interactions are good vs bad?(24:14) 4.3. Its a messed-up ontology, but who cares?(25:35) 5. ...But doesnt this analysis equally disprove the possibility of human helpfulness?(30:14) 6. Conclusion The original text contained 4 footnotes which were omitted from this narration. --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 11, 2026

31m

238

“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff

This post was drafted by Buck, and substantially edited by Anders. "I" refers to Buck. Thanks to Alex Mallen for comments. People who work inside AI companies get access to information that I only get later or never. Quantitatively, how big a deal is this access? Here's an operationalization of this. Consider the following two ways my knowledge could be augmented: I get a crystal ball that tells me all the information I would know n months in the future.I become an employee of a frontier AI company (like OpenAI or Anthropic), with access to all the private information I’d normally get from working at that company. How big would n have to be for me to be indifferent between these two options, from the perspective of learning things that are helpful for making AI go well? The answer is presumably different for me than for many readers, because I’m a reasonably well-connected researcher; I see published information and news from the rumor mill and I talk to researchers at frontier AI companies all the time. (Researchers I know through AI safety usually only tell me information that their employer would approve of, but other researchers occasionally [...] ---Outline:(03:00) What do insiders know?(04:14) Safety work and corporate attitudes(05:34) Model capabilities(07:07) Algorithms and architecture(09:29) How will this change over time?(12:07) Conclusion The original text contained 4 footnotes which were omitted from this narration. --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/84TtjdeLcDTtCLYaP/how-useful-is-the-information-you-get-from-working-inside-an-2 --- Narrated by TYPE III AUDIO.

May 11, 2026

13m

237

“Who Got Breasts First and How We Got Them” by rba

It really is Sydney Sweeney's world, and we’re all just living in it. Human female breasts are an evolutionary mystery along several dimensions. First, breast permanence is unique to humans. All other mammals develop breast prominence during pregnancy or nursing, and the mammary tissue recedes after weaning. This process is called “involution”. In contrast, humans develop breast tissue at puberty before first pregnancies and maintain it permanently after last pregnancies. Second, breasts are costly, both metabolically and potentially from a fitness perspective. Metabolically, because they are fat deposits requiring calories and fitness-wise, because the tissue easily lends itself to malignancy. Breast cancer is apparently rare in captive apes and is overwhelmingly a human disease, often striking women young enough to have children, and so subject to evolutionary selection. Background In Descent of Man, Darwin catalogs human secondary sexual characteristics, but he doesn’t seem to have noted human breast permanence as an issue of interest. Cant, 1981 seems to have been the first to speculate about this systematically and believed breast prominence and permanence might have evolved as a nutritional signal of health to mates indicating potential for maternal investment, a la Robert Trivers. Since then, quite a range of [...] ---Outline:(01:05) Background(04:17) Hypotheses(05:03) Sexual Selection(05:57) Nursing or Thermoregulation(06:34) Camel Hump and fat reserves(07:06) Byproduct or Spandrel.(07:57) Study Design(10:41) Assembling the Genetic Panel(11:14) Subpanel 1: Arrested involution(12:51) Subpanel 2: Pubescent adipose tissue(14:01) Results(17:28) Discussion(20:17) Coda --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/XTHa5C6SgGKYopH7o/who-got-breasts-first-and-how-we-got-them --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 11, 2026

21m

236

“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen

In a recent tweet, Anthropic seems to have asserted that hyperstition is responsible for observed misalignment in their AIs. Strangely, the research they use as evidence actually doesn’t seem to be related to hyperstition at all? I think this is part of a pattern by Anthropic of promoting the theory of hyperstition–the idea that writing about misaligned AI helps bring misaligned AI into existence. Anthropic recently released this tweet as part of a tweet thread for a new research post on alignment. They conclude: “[...] We believe the original source of the [blackmail] behavior was internet text that portrays AI as evil and interested in self-preservation. [...]” However, the research post shared with this tweet doesn’t seem to be about hyperstition at all. Instead they find that training the model on reasoning traces– generated by reflecting on its constitution while giving users ethical advice on difficult dilemmas– reduces misaligned behavior. This presumably works by making the AI better understand what behavior is expected of it by having it reason through concrete scenarios based on its constitution. The post explicitly notes that this works better than training on stories where an AI behaves admirably– which appears more similar to positive [...] ---Outline:(02:06) The adolescence of technology(03:57) Persona Selection Model(04:26) What does this all mean?(05:20) If it was true, this would still be their fault(07:04) What about filtering?(09:31) Personas are a bad alignment strategy --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/xhpktBLttPc6uXcHP/anthropic-s-strange-fixation-on-hyperstition --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 11, 2026

12m

235

“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov

I wrote this essay as a submission to Dwarkesh Patel's blog prize, though I have been meaning to write this up for a while. Usually, for a company to become profitable, they need to increase revenue, decrease costs, or some mixture of the two. For AI companies in their current form, I think there is a third way they can become profitable that looks like increasing revenue but is distinct from what they are currently doing. Namely, internal deployment where they spin up internal companies. First, the AI companies currently aren’t facing a lot of pressure to become profitable. That's partially the reason that OpenAI and Anthropic are the first companies to reach ~900 billion dollars valuation and be cash flow negative. They’ve had the luxury of not being profitable and focusing on growth because the market has been willing to fund their growth. This allows for ideologies within the companies to remain that eventually might not continue to fly, like “we are going post-economic, money won’t matter” or “we will build the machine god and ask it to make money”. But eventually, companies will be forced to become profitable. There is only about ~another round of capital left [...] --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/ARRe4qjcuaRDBfARc/how-the-ai-labs-make-profit-maybe-eventually --- Narrated by TYPE III AUDIO.

May 11, 2026

6m

234

“Sawtooth Problems” by Alexander Slugworth

Red Button, Blue Button On April 24th, 2026, Tim Urban put forth the following poll on Twitter/X: Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press? I love this dilemma, and I'm exhausted by it. I’ve been thinking about it for two straight weeks, and have spent nearly all that time refining my thoughts by writing this piece. It's consumed me in a way that I've never before experienced with any math problem, and I need to get it out of my head. Discourse surrounding the Button Dilemma reminds me of polarizingly political topics. In much the same way that political discussions make people go funny in the head, answers to the Button Dilemma tend to elicit vitriol from people of both Red and Blue conviction. Everyone feels their answer is clear, and everyone is confounded by the lack of consensus. I think this dilemma is pointing to something very important and fundamental about coordination problems. [...] ---Outline:(00:09) Red Button, Blue Button(01:46) What Are We Even Arguing About?(07:24) A Fair Way To Look At It(12:55) Playing With τ and N(16:29) Extreme Sawtooth Problems(18:52) Axiomatic Expansion of Sawtooth Space(22:02) The Map(22:18) Our Parameters(24:00) Regions of the Map(25:43) The Threshold(31:37) Lets Get Weird(31:48) Tragedy of the Commons & Regulation(33:34) The Decision Theory Befuddler(34:07) If Anyone Votes Red, Everyone Dies(34:21) If Anyone Votes Blue, Everyone Dies(36:09) No Matter What Anyone Does, Everyone Dies(36:50) If There Are Fewer Than 16 Blues, Everyone Dies (Except For One Weird Outcome Where Only 5 Reds Survive)(37:11) Weirder Still(39:24) Final Thoughts The original text contained 27 footnotes which were omitted from this narration. --- First published: May 10th, 2026 Source: https://www.lesswrong.com/posts/iyLirpAeQotmZK4QC/sawtooth-problems --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 10, 2026

43m

233

“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied

Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the past few hundred years, billions of people have been lifted out of poverty, and so on. It's a very strong, grounding piece of evidence - probably the best we have in figuring out what our foundational beliefs about the world should be. However, I now think it's a lot less powerful than I once did. Let's take a Darwinian perspective - entities that are better at reproducing, spreading and power-seeking will become more common and eventually dominate the world.[1] This is an almost tautological story that plausibly applies to everything ever, agnostic to the specifics. It first happened with biological life in the last few billion years and humans specifically in the last hundred thousand years. Eventually, it led to accelerating economic growth in the last few thousand years, and in the future it will presumably lead to the colonization of the universe. My core point is this: It makes complete sense that this nihilistic optimization process at first actually benefits some class of agent - because initially, the easiest [...] The original text contained 10 footnotes which were omitted from this narration. --- First published: May 10th, 2026 Source: https://www.lesswrong.com/posts/FxHzT6jeTRhbkzSX3/the-darwinian-honeymoon-why-i-am-not-as-impressed-by-human-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 10, 2026

7m

232

“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine

The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post. I agree with Eliezer's main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can't, for the following reasons. I. International law is largely a fiction (especially when interests diverge sharply) The analogy with nuclear weapons is a poor one. North Korea signed the nuclear non-proliferation treaty and developed nuclear weapons anyway. The treaty deterred only those who weren't very motivated anyway. And the reason why the US and Russia didn't nuke each other has nothing to do with international treaties (see point II). In practice, powerful countries disregard international law whenever they want. A stark example of this is the Budapest Memorandum: in 1994, Ukraine surrendered all its nuclear warheads in exchange for written sovereignty guarantees from Russia, the US, and the UK. Russia annexed a part of Ukraine in 2014, and the international community expressed concern. Russia launched a full-scale invasion in 2022, and the first thing [...] ---Outline:(00:36) I. International law is largely a fiction (especially when interests diverge sharply)(02:01) II. The AI race is perceived as asymmetrical, unlike nuclear MAD(03:03) III. There is virtually zero possibility of consensus on AI risk, unlike nuclear weapons(04:41) IV. The proposed enforcers have a demonstrated track record of not enforcing things(05:30) V. GPU control is not analogous to nuclear material control(06:58) VI. A flawed treaty is not better than nothing(08:26) So is there a better way? The original text contained 3 footnotes which were omitted from this narration. --- First published: May 9th, 2026 Source: https://www.lesswrong.com/posts/Z377spboBjyFAAYAz/international-law-cannot-prevent-extinction-either --- Narrated by TYPE III AUDIO.

May 10, 2026

9m

231

“Neural Networks learn Bloom Filters” by Alex Gibson

Overview: We train a tiny ReLU network to output sparse top- distributions over a vocabulary much larger than its residual dimension. The trained network seems to converge to a mechanism closely resembling a Bloom filter: tokens are assigned sparse binary hashes, the hidden layer computes an approximate union indicator, and the output logits are linearly read from this union. Here's what a small network trained on a toy version of the sparse top- distribution task learns to use: Weight matrix of a 1-layer ReLU network trained via gradient descent on the toy -sparse distribution task below, for , , . Truncated at first tokens for visualisation purposes. Plot of the range of values of , it forms a bimodal distribution. That's the input weight matrix of the trained network. Every entry is either or . The network has effectively encoded a binary hash for each token - and as we'll show, this seems to enable the network to approximately simulate a Bloom filter, and so output the correct set of top- tokens with high probability. We provide a theoretical construction showing how to set the weights to exactly implement a Bloom filter. The real network [...] ---Outline:(00:10) Overview:(02:02) The Task:(03:27) Construction:(04:17) Formal construction:(04:47) Analysis of a single forward pass:(06:13) Training:(07:04) Behavioural analysis of the trained network:(10:14) Mechanistic analysis of the trained network:(16:21) Conclusion / Reflections:(18:24) Related work:(19:25) Further work: --- First published: May 9th, 2026 Source: https://www.lesswrong.com/posts/buxBdp8NtHGgBwabv/neural-networks-learn-bloom-filters --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 10, 2026

20m

230

“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper

I should introduce myself briefly. I'm an independent researcher, striving to understand human consciousness. My research is available at smoothbrains.net. I often work in loose collaboration with a nonprofit called the Qualia Research Institute. We hope to use human phenomenology to inform the construction of structural models of subjective experience – both to help evaluate the viability of different theories of consciousness, and to better model the welfare of sentient beings. Contemporary debate over the moral patienthood of digital minds misses the forest for the trees. Mainstream opinion is divided into physicalist and computationalist camps, who believe that consciousness is substrate dependent and substrate independent, respectively. For this reason, those on the physicalist side frequently make the claim that digital computers will never be conscious. Personally, I consider myself a physicalist, but I'm also a panpsychist – because physics doesn't really seem to deal in hard absolutes, and I find it straightforward to consider that everything is conscious to some greater or lesser degree – so I'm loath to accept any claims which propose that any specific system isn't conscious. I think statements such as these are not defensible, and only serve to encourage misunderstanding and even foment philosophical [...] ---Outline:(04:17) My position statement(08:50) My argument(09:07) 1. The translation problem(10:26) Building a physicalist translation function(19:39) 2. The simplicity problem(20:31) Computationalist translation functions are observer dependent(25:17) 3. The introspection problem(27:44) Digital hardware prohibits phenomenal introspection(29:32) Conclusion(32:56) My research(33:33) 1. Is the brain an optical computer?(34:16) 2. If the brain is an optical computer, how is it constructed?(34:59) 3. How do we ensure the well-being of conscious computers? --- First published: May 9th, 2026 Source: https://www.lesswrong.com/posts/TjwRyZdyhouePJrzP/if-digital-computers-are-conscious-they-are-conscious-at-the --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 9, 2026

36m

229

“Why You Can’t Use Your Right to Try” by Stephen Martin

The Availability Problem: Imagine you have cancer, or chronic pain, or a progressive degenerative disease of some sort. You have exhausted the traditional treatment options available to you, and none of them have worked. However, there are treatments that are still undergoing clinical trials which might help you. They are not fully approved yet, but your situation is dire and you don’t have time to wait another 10 years for the trials to finish. Can you access those treatments? In theory yes, you can access unapproved treatments through federal laws like the 2018 Right to Try bill, or through FDA pathways like “Expanded Access”. However these laws don’t mandate that the company making the drug gives it to you. And what you will find when you try to use your Right to Try, or Expanded Access, is that there are almost no treatments available for use. That's why despite there being somewhere in the neighborhood of 13,000,000 Americans with terminal or serious illness, the FDA only grants about 2,000 Expanded Access requests per year, even though they approve 99% of all requests, typically within 24 hours. There just aren’t enough companies even bothering to apply. No one [...] --- First published: May 9th, 2026 Source: https://www.lesswrong.com/posts/oTwKS5iWZ6Dz84vtf/why-you-can-t-use-your-right-to-try --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 9, 2026

9m

228

“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke

The simple mental picture A simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity within a certain range of capabilities. The sensitivity of a benchmark, i.e. it's ability to distinguish the capability of different models, is given by a curve like this: The curve starts high (low sensitivity, high uncertainty), since for models with low capability all the tasks in the benchmark are too hard, and the benchmark can't distinguish between low and very low capability. Similarly all the tasks are too easy for a very capable model, and we lose the ability to differentiate again. In between is the range of capabilities the benchmark is sensitive to, and the sensitivity curve tells you how easy it is to distinguish small capability differences between models at different overall capability levels. A good benchmark is very sensitive over a long range of capabilities, but there is a tradeoff. Say you want to make a benchmark with 1000 questions. You could make the questions all have roughly the same difficulty. That would make you very sensitive to capabilities close to that difficulty, but you would only [...] ---Outline:(00:09) The simple mental picture(02:06) Epoch Capability Index (ECI) as a toy model --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/JzfcJMgfkhfRhwg4C/a-benchmark-is-a-sensor --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 9, 2026

5m

227

“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch

Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you know, *actually* article writers don't write their own headlines? … But what I care about is the misleading headline, not your org chart __ Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational limitations and why the team responsible for fixing the problem had limitations on resources like senior employees and compute, and actually not fixing the problem was the correct priority for them etc etc etc. But what I (and my friend) cared about was the prosaic safety problem not being fixed! And what this says about the company's ability to proactively respond to and fix future problems. We’re complaining about your company overall. Your internal team management was never a serious concern for us to begin with! __ A third example comes from Kelsey Piper. Kelsey wrote about the (horrifying) recent case where Hantavirus carriers in the recent [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/PCsmhN9z65HtC4t5v/bad-problems-don-t-stop-being-bad-because-somebody-s-wrong --- Narrated by TYPE III AUDIO.

May 9, 2026

5m

226

“Write Cause You Have Something to Say” by Logan Riggs

The ones who are most successful at writeathons (Inkhaven, NaNoWriMo) are those with an overhang of things to say, usually in the form of: draft postsdaydreams When Scott Alexander said: "Whenever I see a new person who blogs every day, it's very rare that that never goes anywhere or they don't get good. That's like my best leading indicator for who's going to be a good blogger." (source). , it may seem you can just write every day, but that'd be Goodharting. There's something hidden in the writing process you can't see: they have something to say. They'll have an idea (somehow) and think it through by [writing it out/sitting quietly/etc]. This can then generate more ideas, some of which aren't even related to the original idea! At this point, though, my imaginary interlocutor would like to say: I'm trying to publish a blog post every day, so of course I'll eventually be bottlenecked on ideas! How do you generate them though? Catching Ideas Have an idea? Write down the idea. This is equivalent to giving your idea-generating process a cookie, reinforcing the habit of generating ideas. Sometimes, when I'm writing one post, a different idea will [...] ---Outline:(01:18) Catching Ideas(02:32) Just \[Write\] and Nobody Will Get Hurt The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/h5n3rscJ7he3yLseo/write-cause-you-have-something-to-say-1 --- Narrated by TYPE III AUDIO.

May 8, 2026

3m

225

“AI is Breaking Two Vulnerability Cultures” by jefftk

A week ago the Copy Fail vulnerability came out, and Hyunwoo Kim immediately realized that the fixes were insufficient, sharing a patch the same day. In doing this he followed standard procedure for Linux, especially within networking: share the security impact with a closed list of Linux security engineers, while fixing the bug quietly and efficiently in the open. His goal was that with only the raw fix public, the knowledge that a serious vulnerability existed could be "embargoed": the people in a position to address it know, but they've agreed not to say anything for a few days. Someone else noticed the change, however, realized the security implications, and shared it publicly. Since it was now out, the embargo was deemed over, and we can now see the full details. It's interesting to see the tension here between two different approaches to vulnerabilities, and think about how this is likely to change with AI acceleration. On one side you have "coordinated disclosure" culture. This is probably the most common approach in computer security. When you discover a security bug you tell the maintainers privately and give them some amount of time (often 90d) [...] --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/wKzWGMoubHoHRC4ng/ai-is-breaking-two-vulnerability-cultures --- Narrated by TYPE III AUDIO.

May 8, 2026

3m

224

“Is ProgramBench Impossible?” by frmsaul

ProgramBench is a new coding benchmark that all frontier models fail spectacularly. We’ve been on a quest for “hard benchmarks” for a while so it's refreshing to see a benchmark where top models do badly. Unfortunately, ProgramBench has one big problem: it's impossible! What is ProgramBench? ProgramBench tests if a model can recreate a program from a “clean room” environment. The model is given only a bit of documentation and black-box access to the program (all the programs are CLIs), then tasked with re-implementing it. How does ProgramBench know if the implementation is correct? It also generates a bunch of unit tests for the program[1]. The re-implementing coding agent doesn't have access to any of those tests. The coding agent only considers a task “resolved” if it passes all of the tests and “almost resolved” if it passes 95% of them. Why is this problematic? Obscure behavior can enter the unit tests without being in the clean room path. An extreme version of this is a backdoor: program that behaves in one way most of the time but behaves totally differently when exposed to a specific string. This wouldn't make a task literally impossible, just incredibly hard in [...] ---Outline:(00:37) What is ProgramBench?(02:41) This seems like a theoretical issue, does it actually happen?(03:11) What can we do differently? The original text contained 4 footnotes which were omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/3pdyxFi6JS389nptu/is-programbench-impossible --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 8, 2026

5m

223

“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa

Preamble The preamble is less useful for the typical AlignmentForum/LessWrong reader, who may want to skip to Adversaria vs Basinland section. On 28th of October 2025, Geoffrey Irving, Chief Scientist of the UK AI Security Institute, gave a keynote talk (slides) at the Alignment Conference. The conference was organised by the UK AISI and FAR.AI as part of the Alignment Project, which aims to bring experts from relevant fields to make progress on the alignment problem. TLDR: Adversaria vs Basinland. We might be in one of two worlds. One where alignment is adversarial (a security problem), one where it is navigational (a search for good basins of training behaviour). We don't know which world we are in, and how we train and deploy AIs may determine this.We need new disciplines. The field is small, thinly resourced and approached from only a handful of angles. A few well-placed ideas from other disciplines could disproportionately shift what's achievable.Even if this all fails, evidence of hardness is valuable. Moving past broad framing to details Alignment means ensuring that AI systems do what humans want. This is the broad framing. There is, of course, a lot of complexity [...] ---Outline:(00:12) Preamble(01:25) Moving past broad framing to details(02:15) We should plan for superintelligence(03:33) Adversaria vs Basinland(07:04) Which world are we in?(08:39) Why a few ideas might be enough(10:38) Write down problems(11:25) Multiple new ideas, fitting together(13:18) Spherical cows vs the mess(15:03) Conclusion(15:56) Acknowledgement The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/cWFsCFyCttsiJwn2j/bringing-more-expertise-to-bear-on-alignment --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 8, 2026

16m

222

[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston

This is a link post. TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europe/UK. On August 27, 2005, at an annual symposium in Jackson Hole, Raghuram Rajan, then chief economist of the International Monetary Fund, argued in front of central bank governors and top officials that the innovations of the previous decade in banking had not made the world safer. The financial instruments built over the previous decade, he argued, had become so intricate that even their creators no longer fully understood the risks they carried. Risk had migrated to institutions the supervisory system was not designed to watch. And the people running those institutions were compensated in ways that rewarded short-term performance over long-term stability. The reception was hostile. Lawrence Summers, a former U.S. Treasury Secretary at the time, rose from the audience to attack the paper, calling its premise "slightly Luddite" and "largely misguided," and warning that the kind of changes Rajan argued for would only reduce the productivity of the financial sector. Three years after Jackson Hole, major banks collapsed, first Bear Stearns, then Lehman Brothers, then Merrill Lynch, then AIG. [...] --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/gnZyTQFqLhiHdHELC/how-to-prevent-ai-s-2008-moment-we-re-hiring Linkpost URL:https://forum.effectivealtruism.org/posts/7nq5vK2xo85e9GZjC/we-re-hiring-three-people-to-prevent-ai-s-2008-moment --- Narrated by TYPE III AUDIO.

May 8, 2026

4m

221

“AI #167: The Prior Restraint Era Begins” by Zvi

The era of training frontier models and then releasing them whenever you wanted? That was fun while it lasted. It looks likely to be over now. The White House wants to get an advance look and have the option to veto your release decisions, and it has used this veto on an expansion of access to Mythos. We have additional clarity on what that might mean and it does not look good. Hassett explicitly used the FDA as a parallel, which is the actual worst option unless your goal is to strange or pause AI development in America, without a parallel action from China. That doesn’t seem like a great plan to me and Susie Wiles is out doing damage control. The part where we are talking to China to coordinate model access restrictions does seem better. Anthropic continues its explosive growth, and it continues to strike compute deals. In addition to a long term expanded deal with Google, Anthropic is now leasing SpaceX's Colossus 1, which has let them expand usage limits immediately, and Elon Musk is now speaking positively about Anthropic, including its motivations. This comes as we get testimony in the Musk [...] ---Outline:(01:45) Language Models Offer Mundane Utility(02:45) Language Models Dont Offer Mundane Utility(05:09) Huh, Upgrades(05:37) Grok 4.3 Exists But xAI Kind Of Doesnt(07:02) Show Me The Compute(13:23) On Your Marks(14:05) Copyright Confrontation(14:19) Deepfaketown and Botpocalypse Soon(15:57) Fun With Media Generation(16:36) A Young Ladys Illustrated Primer(16:48) Cyber Lack of Security(17:05) They Took Our Jobs(17:39) The Art of the Jailbreak(17:49) Introducing(18:09) Musk v OpenAI(21:14) Show Me the Money(23:03) Peace In Our Time(26:18) Quiet Speculations(28:24) Quickly, Theres No Time(30:35) The Quest for Sane Regulations(34:04) People Really Hate AI(34:53) Chip City(35:05) The Week in Audio(36:49) People Just Say Things(40:22) People Just Publish Things(41:05) Google Sells Out(42:08) Greetings From Project Glasswing(44:57) The Prior Restraint Era Begins(56:42) Is This Even Legal?(59:49) Pick Up The Phone(01:03:27) Rhetorical Innovation(01:04:05) People On The Internet Sometimes Lie(01:07:14) Goblin Mode(01:08:34) The Mask Comes Off(01:16:51) Aligning a Smarter Than Human Intelligence is Difficult(01:20:00) Some Penalties May Apply(01:22:28) Messages From Janusworld(01:22:41) Good Advice(01:23:42) The Lighter Side --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/rn3iKuDcE4SiSg4DW/ai-167-the-prior-restraint-era-begins --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 8, 2026

1h 26m

220

“Mechanistic estimation for wide random MLPs” by Jacob_Hilton

This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly initialized multilayer perceptron (MLP), produce an estimate for the expected output of the model under Gaussian input. The usual approach to this problem is to sample many possible inputs, run them all through the model, and take the average. Instead, we produce an estimate "mechanistically", without running the model even once. For wide models, our approach produces more accurate estimates, both in theory and in practice. Paper: Estimating the expected output of wide random MLPs more efficiently than sampling Code: mlp_cumulant_propagation GitHub repo We are excited about this result as an early step towards our goal of producing mechanistic estimates that outperform random sampling for any trained neural network. Drawing an analogy between this goal and a proof by induction, we see this result as (part of) the "base case": handling networks at initialization. We have a vision for the "inductive step", although we expect that to be much more difficult. Summary of results [...] ---Outline:(01:29) Summary of results(04:39) Significance of results(07:18) Extending to trained networks(08:36) Conclusion The original text contained 18 footnotes which were omitted from this narration. --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/fsG4m6sRMpomd7Rk6/mechanistic-estimation-for-wide-random-mlps --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 7, 2026

9m

219

“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training. We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data. NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...] ---Outline:(00:15) Abstract(01:53) Twitter thread(05:14) Blog post(07:40) What is a natural language autoencoder?(10:06) Understanding what Claude thinks but doesnt say(13:12) Discovering hidden motivations(15:51) The future of NLAs --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 7, 2026

18m

218

“Try, even if they have you cold” by WalterL

I think smart people try things less often than they should, because of a cached mental pattern where you think of what might go wrong, and you find a foolproof countermeasure on the part of some antag, and so we call it off. Stockfish, playing itself, might as well resign from the first move if you force it to give knight odds. Sensei(the Go AI), should do the same when it has to give 6 stones. Getting ready to go into the stock market, do I really think I have some edge that Adderall swilling quants and their AI pets haven't already priced in? (The pricing in, of course, is also priced in). And yet Stockfish could wipe the floor with the breathing populace with knight odds. Sensei regularly beats kyu players with 6 stones (or would, if one could find a kyu player who isn't running the AI on their other tab), and my modest portfolio of index funds made me money last year. (But not provably more than....) The robots, of course, are just benefiting from their imperfect adversaries mistakes. They play in a losing situation because, over time, their foes will crumble and ruin their own [...] --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/aBhMGziEwA7FXNxhq/try-even-if-they-have-you-cold --- Narrated by TYPE III AUDIO.

May 7, 2026

3m

217

“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck

Last week, OpenAI staff shared an early draft of Investigating the consequences of accidentally grading CoT during RL with Redwood Research staff. To start with, I appreciate them publishing this post. I think it is valuable for AI companies to be transparent about problems like these when they arise. I particularly appreciate them sharing the post with us early, discussing the issues in detail, and modifying it to address our most important criticisms. I think it will be increasingly important for AI companies to have a policy of getting external feedback on the risks posed by their deployments, and in particular having some external accountability on whether they have adequate evidence to support their claims about the level of risk posed; as an example of this, see METR reviewing Anthropic's Sabotage Risk Report. We at Redwood Research are interested in participating in this kind of external review of evidence about safety. So I am taking this as an opportunity to try out writing this kind of review. If you work at a frontier AI company, please feel free to reach out if you’d like our review of similar documents. My overall assessment is that I mostly agree with the [...] ---Outline:(01:34) Assessing the evidence that CoT training did not damage monitorability(10:36) How much does this analysis rely on information that wasnt provided?(12:08) Small amounts of RL training on CoT might not be more important than other sources of CoT unreliability(13:20) AI companies will eventually need to learn not to make mistakes like this The original text contained 5 footnotes which were omitted from this narration. --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/juCHTdZpZBGooHKW4/a-review-of-investigating-the-consequences-of-accidentally --- Narrated by TYPE III AUDIO.

May 7, 2026

15m

216

“There is no evidence you should reapply sunscreen every 2 hours.” by Hide

It's incredible how many consensus guidelines dissolve when you look closely at them. If you listen to any authority on the subject of sunscreen, you will hear it endlessly repeated that you absolutely must reapply sunscreen every 2 hours while you are in the sun, and immediately after swimming, sweating, or exercising. Not only that, you’ll hear that you need to apply sunscreen before going outside, even if you put it on earlier and stayed indoors. The rationale behind this is straightforward and plausible: sunscreen's effectiveness degrades over time, therefore prolonged sun exposure warrants topping up on protection. However, when you look closely at the origins of this guideline, and the evidence base for its instantiation in regulations and official statements, it turns out that this 2-hour rule is a baseless, circularly justified, expedient fiction. Where does the FDA's 2 hour reapplication guideline come from? Tracing the history of the 2-hour reapplication guideline reveals an extremely shaky base of evidence. The first official sunscreen rulemaking in the US was in 1978, where they recommend: "apply sunscreen products liberally and to reapply after swimming or excess perspiration". No fixed universal time interval is [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/daTGKn3pXzs75nSB7/there-is-no-evidence-you-should-reapply-sunscreen-every-2 --- Narrated by TYPE III AUDIO.

May 7, 2026

16m

215

“Many individual CEVs are probably quite bad” by Viliam

I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old. I am not sure how exactly a person's CEV is defined. "If we knew everything and could self-modify" seems potentially sensitive to the precise chronological order of "realizing things" and "self-modification". Like, imagine Hitler getting the godlike powers of knowledge and self-control. If he gets the perfect knowledge of economy, sociology and psychology first, he could go like: "Oh, now I realize that the things I blamed on the Jews are actually caused by something else. How embarrassing. No more anti-semitism, but I better erase everyone's memory first." But it is also possible that he gets the self-control first, and he realizes that there is such a thing as value drift, and thinks: "Oh my, this could accidentally make me more similar to the Jews. I better hardcode the Nazi ideals in myself immediately, and also give myself blond hair and blue eyes." And using the superior knowledge, he hardcodes the Nazi values in himself so that they are reflectively stable and survive all updates. So, Hitler's CEV seems to depend on the technical [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/FvERMXkaobQvdjS4q/many-individual-cevs-are-probably-quite-bad --- Narrated by TYPE III AUDIO.

May 6, 2026

5m

214

“x-risk-themed” by kave

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role. Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern. Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal's mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they’re honest about their limits. But, given that they're willing to trade off so much impact for fit, I’m surprised that basically no one mentions [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/eW7knx6zPSKzFc8iK/x-risk-themed --- Narrated by TYPE III AUDIO.

May 6, 2026

6m

213

“What is Anthropic?” by Zvi

What is Anthropic? How does it relate to Claude? What is OpenAI? What is ChatGPT? How does OpenAI relate to it? Is it a mere tool? Is a future of Tool AI a thing, and why do people keep claiming that it is, or that saying makes it so? This post organizes and gives context for a bunch of discussions and messaging on Twitter that would otherwise be quickly buried and lost. What Is Anthropic? Here is one theory, and various people thinking about it. Roon as always is using rhetorical flourish (e.g. note that Roon thinks it is obvious that parents worship their children, in this sense) but this perspective is definitely useful. Such discussions by default disappear when they happen on Twitter, so here is a preservation of key parts of it. roon (OpenAI): it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I [...] ---Outline:(00:35) What Is Anthropic?(11:35) What Is This Supposed Tool AI? --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/6wbLXhkQAPcunrYnq/what-is-anthropic --- Narrated by TYPE III AUDIO.

May 6, 2026

17m

212

“What if LLMs are mostly crystallized intelligence?” by deep

Summary LLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intelligence by learning patterns from training data, and this is sufficient to make them surprisingly skillful at lots of tasks. But for a given capability level in the areas they’ve trained on, LLMs have very weak fluid intelligence compared to humans. For example, two years ago I thought human-level SAT performance would mean AGI, but turns out LLMs can do great at the SAT while being mediocre at lots of other tasks. I’m not saying LLMs are just parrots (that's dumb).[1] There's a continuity between crystallized and fluid intelligence. At the extreme “crystal” end we have shallow locally-valid heuristics. Pure pattern matching. Now-largely-debunked “stochastic parrot” hypothesis.At the extreme end of “fluid” you have a cross between an idealized consultant, a Renaissance man, and MacGuyver. A deep world model and general reasoning, able to come to grips with any particular environment and problem, and to invent new tools and concepts on the fly.Some other ways to gesture at this: what n-gram of Markov chain you’d need to capture a behavioral pattern; number of tasks the pattern is [...] ---Outline:(00:10) Summary(02:47) Implications for AI futures(04:54) We should check if this is true!(05:44) Modeling worlds where AI progress is hungry for domain data(06:15) What types of areas see progress in this model?(09:38) There are also stories for how advanced AIs could route around data bottlenecks:(10:36) Which concrete domains see progress?(12:03) Implications for AI takeoff(12:08) While it lasts, weak fluid intelligence is great news for alignment risk(12:58) A key bifurcation point: can AIs revolutionize AI R&D, or merely speed it up?(14:51) Is this the world we live in?(16:09) How can we test this hypothesis? The original text contained 1 footnote which was omitted from this narration. --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/Zxw3ZcmSdndpQyJ6M/what-if-llms-are-mostly-crystallized-intelligence --- Narrated by TYPE III AUDIO.

May 6, 2026

18m

211

“Your rights when flying to Europe” by Yair Halberstadt

Europe (and the UK) have strong protections for flyers in the case of delayed or cancelled flights. However very few people are aware of these, and airlines will almost always try to wriggle out of paying up. Even travel agents are often unaware of these laws, or unwilling to fight the airline for you. Given the rollercoaster that flying to/from Israel has been in the last 3 years, I've had my share of experience forcing airlines to pay up what they owe, so I thought it might be valuable turning that into a post. These regulations are enshrined in EU 261. You can see the full text here, and equally importantly the interpretive guidelines here that cover many edge cases. TLDR When flying into or out of the EU or UK, consider booking a flight with an EU or UK based airline.Don't book a car/hotel with the airline, as that turns it into a package deal which has weaker rights.Preserve records of all interactions with the airline. Prefer text based chat to phone as this is easier to records. If you do phone, and get a negative answer, follow up with a text based chat to [...] ---Outline:(00:51) TLDR(02:09) Which flights do these laws apply to?(02:40) What are my rights?(03:51) How will airlines try to screw you over?(05:06) Your playbook for flight cancellation(07:53) Your playbook for compensation(08:11) Do Nots(08:56) What about...? --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/F5KkZiGMytDJszzyg/your-rights-when-flying-to-europe --- Narrated by TYPE III AUDIO.

May 6, 2026

9m

210

“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov

tl;dr We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussing their Model Spec, teaching them how they should behave and why. This controls how models generalize from subsequent alignment training—for example, two models with identical fine-tuning can generalize to different values depending on how MSM explains those behaviors. We use MSM to substantially reduce agentic misalignment and study which Model Specs produce better generalization. 📝Blog, 📄Paper, 💻 Code Introduction Some frontier AI developers aim to align language models to a Model Spec or Constitution that describes intended model behavior. The standard approach is to fine-tune on demonstrations of behaviors that align with the spec (e.g., conversations where the model acts as intended). However, this can fail to produce robust alignment. For example, LLM agents have been shown to take unethical actions (e.g., blackmailing, leaking company information, alignment faking) when placed in scenarios different from those appearing in their alignment training (Lynch et al., 2025; Jarviniemi and Hubinger, 2024; Greenblatt et al., 2024) We propose model spec midtraining (MSM), a method for shaping how models generalize from alignment fine-tuning (AFT). MSM is motivated by the hypothesis that AFT [...] ---Outline:(00:52) Introduction(02:24) Different generalization, same fine-tuning data(04:36) Reducing agentic misalignment(07:39) How does MSM scale with AFT compute?(08:58) Model Spec science(12:39) Conclusion --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/R3Rrw8EscuRKxMFTz/model-spec-midtraining-improving-how-alignment-training --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 6, 2026

13m

209

“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi

The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new model will have to ask for permission. This would be the antithesis of all their previous rhetoric, and all their actions to systematically avoid laying a foundation to do this in an orderly and informed fashion. But now, with the existence of Mythos, and a potential coming hackastrophe where cyber attackers will by default have the edge and we desperately need defenders to have a head start, it is not clear they feel they have a choice. If implemented well, this could be the right thing. By default, it won’t be implemented well. Project Glasswing Cannot Expand The government is now deciding which models can and cannot be made available on particular terms to particular parties. This is already happening. Anthropic wanted to expand the number of companies with access to Mythos as part of Project Glasswing. The White House said no. It is not clear this is any of [...] ---Outline:(00:54) Project Glasswing Cannot Expand(02:33) The Ad-Hoc Prior Restraint Era Begins(09:44) Implementation Through CAISI(12:49) What Should We Do About AI?(14:16) The Chain of Command Nonsense Continues(16:29) The Government Should Maintain Multiple AI Providers(16:56) Hows It Going To End? --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/QX2ZCfkpWGqkyvStN/the-ai-ad-hoc-prior-restraint-era-begins --- Narrated by TYPE III AUDIO.

May 5, 2026

18m

208

“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd

Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This last one is confirmation bias. - From Scott Alexander's review of Julia Galef's The Scout Mindset. Alexander goes on to argue that this bias is the source of polarization in society, which is distorting our beliefs and setting us at each other's throats. How could someone believe such different things unless they're either really stupid or lying to conceal their selfishness? I think smart people who care about the truth go on believing conflicting things largely because of confirmation bias and motivated reasoning. The corner of civilization I'm most worried about is the one figuring out how to handle the advent of strong AI. I'm not telling anyone which direction to update, but I am suggesting that we are probably a little to a lot overconfident in our beliefs about alignment and AI impacts. I think the effects of biases are still strong and still overlooked in this corner of civilization, despite its strong values of truth-seeking and relative awareness of biases. Bias has more influence where there's less direct evidence, and that's the case in [...] ---Outline:(04:50) 1.1. Motivated reasoning(10:01) 2. Empirical evidence for confirmation bias(12:08) 2.1. Bias in evaluating evidence(17:48) 2.2. Bias in selecting evidence(20:57) 2.3. Bias in remembering evidence(22:37) 2.4. Other causal explanations of confirmation bias effects(25:17) 2.5. Empirical evidence for motivated reasoning(27:26) 3. Limitations in human cognitive capacity for very complex problems(29:16) 3.1. Introspection suggests fuzzy models and updating(30:32) 3.2. Intuition vs. analysis - evidence and brain mechanisms(34:19) 3.3. Bayesian reasoning is an ideal, not a method(36:40) 3.4. AI risk is complicated(40:18) 4. Compounding of confirmation bias(42:05) 4.1. Example of frame/hypothesis choices and confident disagreement among experts(47:29) 4.2. Social compounding of confirmation bias effects(49:45) 4.2.1. Social effects on evaluating evidence.(52:41) 4.2.2. Social effects on selecting evidence, memory, and framing(57:48) 4.2.3. Interlude: dont give up on seeking truth(58:32) 4.2.4. Social belief contagion or information cascade effects(01:03:07) 4.3. Very rough estimates of total compounded confirmation bias(01:09:01) 5. Implications and remediations(01:12:02) 5.1. Standard remediations(01:14:09) 5.2. Remediations for motivated reasoning(01:17:39) Conclusion The original text contained 9 footnotes which were omitted from this narration. --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/QpgmEhBvJQxAfFMP2/motivated-reasoning-confirmation-bias-and-ai-risk-theory --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 5, 2026

1h 18m

207

“Are you looking up?” by Craig Green

This is my first post to Less Wrong. I'm not sure if the moderators will consider it appropriate or not. I share it here for feedback on my writing. Nothing in here is likely to be new to readers of this forum. It is hortative literature intended to stir you on to live a rational and ethical life. The material of the exhortation is atypical of what I have observed here, but I've only been a reader for a little while. I was reading through the Sequences when I wrote this, and feel indebted to the ideas expressed therein. Please ignore the Substack link, I'm not committed at all to writing their with consistency. It just felt a bit weird to not do a link post when this is in fact a mirror of something I wrote there. The conclusion in my judgment is a failure, but I was worried I'd never do this at all if I didn't publish it now. When you are standing on the ground, you can’t really tell how much taller Willis Tower is than everything else in Chicago. Walking down the street, craning your neck, gawking at the verticality of it all, you [...] The original text contained 11 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/kBKrMSSZLEE5RWnur/are-you-looking-up --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 5, 2026

15m

206

[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about. Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have historically struggled with. We also build attribution graphs of the model for some prompts using causally important parameter subcomponents as the nodes, and interpret parts of them. While we made these graphs, we discovered that our adversarial ablation method seemed pretty important for faithfully identifying which nodes in them were causally important for computing the final output. We think this casts some doubt on the faithfulness of subnetworks found by the majority of other subnetwork identification methods in the literature.[3][4] More details and some examples can be found in the paper. Additionally, as with our previous technique SPD, VPD does not [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/eAQZaiC3PcBhS4HjM/linkpost-interpreting-language-model-parameters Linkpost URL:https://www.goodfire.ai/research/interpreting-lm-parameters --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 5, 2026

4m

205

“Housing Roundup #15: The War Against Renters” by Zvi

So many are under the strange belief that there is something terrible about not owning the house in which you live. So we massively subsidize home ownership, and try to actively interfere with renting. Except when we do rent control, which turns renting into a form of owning, and allows us to take real property and de facto give it to current renters. A lot of this is pure attempts to punish and exclude the poor. If you can’t afford a downpayment, we don’t want you living here. Go away. Some of it is the belief that when you rent, you are being ‘taken advantage of’ and that such a deal could not possibly be fair. Some of it is that if you don’t own, you don’t have the incentive to drive up property values. Which means you won’t properly work to ‘improve’ your local area, especially that you won’t conspire to block housing. The result of this is that if you’re not willing to commit to living in one place for years, or you can’t afford a down payment, you get punished, and punished hard. Owning Versus Renting The graph [...] ---Outline:(01:08) Owning Versus Renting(03:11) Build To Rent Is Good Actually(08:16) Elizabeth Warren, Full Supervillain(09:44) The Better Case Against Corporate Housing Ownership(11:50) The ROAD Act Bans Building And Then Renting Houses(14:17) Rental Covenants(15:09) Extended Eviction Delay After Nonpayment Is Mostly Bad(16:51) Los Angeles Renting(19:30) Sufficiently Advanced Rent Control Is Indistinguishable From Ownership(23:05) England Tries To Ban Renting(24:36) Claude Rental Discounts --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/jW4TeNZhxBA9Fzdim/housing-roundup-15-the-war-against-renters --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 5, 2026

25m

204

“It’s nice of you to worry about me, but I really do have a life” by Viliam

I have two shameful secrets that I probably shouldn't talk about online: I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "Doesn't he realize that at his next job interview, HR will probably use an AI that can match his online writing based on a short sample of written text, and when they ask 'hey AI, is this guy really 100% devoted to his job, and does he spend his entire days and nights thinking about how to make his boss more rich?', the AI will laugh and print: 'beep-boop, negative, mwa-ha-ha-ha'." And, hey, I get it. If I had a company, and I could choose between two people who are about equally qualified, but for one of them, working hardest for me is the true meaning of his life, while the other one only hopes to collect his salary and then go home and spend the rest of his day with his wife and children, I would also prefer to hire the former. Which is why so many of us pretend to be the former. Even when we are not. Because we prefer that our families not starve. Thus the job interviews [...] --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/qRZLEBmNtT6LBuFsE/it-s-nice-of-you-to-worry-about-me-but-i-really-do-have-a --- Narrated by TYPE III AUDIO.

May 4, 2026

6m

203

“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky

Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars[1970], equivalent to about 7 billion dollars[2025]. The Viking 1 probe operated on Mars's surface for six years, before its battery began to seriously degrade. One might have thought a battery problem like that would spell the irrevocable end of the mission. The probe had already launched and was now on Mars, very far away and out of reach of any human technician's fixing fingers. Was it not inevitable, then, that if any kind of technical problem were to be discovered long after the space launch in August 1975, nothing could possibly be done? But the foresightful engineers of the Viking 1 probe had devised a plan for just this class of eventuality, which they had foreseen in general, if not in exact specifics. They had built the Viking 1 probe to accept software updates by radio receiver, transmitted from Earth. On November 11, 1982, Earth sent an update to the Viking 1 lander's software, intended to make sure the battery only discharged down to a minimum voltage level [...] ---Outline:(00:13) Example 1: The Viking 1 lander(04:25) Example 2: The Mars Observer(11:37) Example 3: The Maginot Line(15:37) Other supposed refutations of oneshotness(24:16) On the extraordinary efforts put forth to misinterpret the idea of oneshotness(33:52) The secret sauce of competent engineers in Murphy-cursed fields: only trying projects so incredibly straightforward as to be actually possible. The original text contained 7 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi --- Narrated by TYPE III AUDIO.

May 4, 2026

37m

202

“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder

How fast could an AI-driven economy grow? Most economists expect a few percentage points at best, comparable to previous general-purpose technologies (Acemoglu (2024)). Those closer to AI development tend to imagine something much more radical (Shulman (2023); Davidson and Hadshar (2025)). This series aims to ground growth rates in how physical production works. Once human labor is automated, the constraint on growth becomes the speed at which the economy's physical capital can reproduce itself. Government input-output tables track the full supply chain of what it takes to produce every commodity in the economy, and we can use them to compute this self-reproduction rate directly. In this post, I compute the maximum rate at which an autonomous AI economy could grow, once its production is concentrated in the sectors most important for self-replication. I take the conservative case for this calculation: full automation, but no other technological improvement. Using US input-output data, I find this economy could double in about a year, in line with other estimates that assume full automation (Hanson (2001); Trammell and Korinek (2023); Davidson and Hadshar (2025); Epoch AI (2025)). This holds up even after accounting for resource depletion and construction lags. Some output [...] ---Outline:(03:45) If labor were free, the economy could grow very fast(06:50) AGI makes labor approximately free(12:11) Resource extraction is unlikely to significantly slow growth(15:13) Construction lags do not prevent rapid growth(18:27) Consumption does not preclude rapid growth(20:12) Summary(22:00) Appendix A: The input-output formulation(22:05) A.1 The material-balance identity(24:45) A.2 The balanced-growth path and the Perron eigenvalue(25:41) A.3 Construction lags(27:37) A.4 Data sources(27:42) The intermediate-input matrix(28:49) The capital-requirements matrix(31:13) The depreciation matrix(32:32) Government infrastructure(33:27) Capacity utilization(35:44) Robot and compute sectors(37:59) Appendix B: Von Neumann growth rates are similar across industrial economies(40:30) Appendix C: Resource extraction(40:35) C.1 Minerals(40:50) The grade-cost scaling law(42:09) Skinners mineralogical barrier(43:19) Iron(43:58) Aluminum (bauxite)(44:41) Copper(47:50) Nickel(49:15) Lithium(49:47) Cobalt(50:24) Manganese(50:49) Rare earth elements(52:11) Platinum group metals(52:49) Deep-sea mining(54:01) Summary table(55:15) C.2 Fossil fuels(56:17) Oil(58:30) Natural gas(01:00:13) Coal(01:01:28) Summary(01:02:25) C.3 Electrification The original text contained 2 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/rpqGWRoRWvqJ4Hqgn/ai-industrial-takeoff-part-1-maximum-growth-rates-with --- Narrated by TYPE III AUDIO.

May 4, 2026

1h 04m

201

“Taking woo seriously but not literally” by Kaj_Sotala

I think that a lot of “woo” - a broad term that includes things like chakras, energy healing, Tarot, various Eastern religions and neopagan practices, etc. - consists of things that have real effects and uses, even if many (though not all) of their practitioners are mistaken about the exact mechanisms and make unwarranted metaphysical claims. Now, a woo practitioner might explain what's happening in a way that doesn’t fit any sensible scientific model of the world. Some of them seem to bastardize poorly understood pop-explanations of quantum mechanics, or, in the opposite direction, outright reject “the thinking mind” and science as valid sources of truth. That makes it easy for a scientifically-minded person to reject all of the practitioners as delusional. But consider meditation. In the 1960s and 1970s, the scientific establishment mostly thought of it as nonsense, and not without reason. Proponents of Transcendental Meditation (TM), for instance, made a variety of bizarre claims. For instance, they claimed the existence of “the Maharishi Effect”. According to them, if one percent of a population practices TM, this would significantly increase the well-being of everyone in that population. A more advanced practice was “Yogic Flying”, where the participants hopped [...] ---Outline:(03:02) Tarot(07:30) Different camps within woo(11:29) Chakras and energy(22:06) Energy healing(24:27) Energy as an abstraction within a system(28:55) The science of woo(34:56) What about other forms of woo?(37:48) Should you do woo? The original text contained 2 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/oMqx9D9EEW9AMDsbf/taking-woo-seriously-but-not-literally --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 4, 2026

38m

“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice

[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley

“The primary sources of near-term cybersecurity risk” by lc

“Most “inner work” looks like entertainment.” by Chris Lakin

[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn

“Voters are surprisingly open to talking about AI risk” by less_raichu

“Childhood and Education #18: Do The Math” by Zvi

“The Owned Ones” by Eliezer Yudkowsky

“Optimisation: Selective versus Predictive” by Raymond Douglas

“AI companies are already profitable (in the way that matters)” by Yair Halberstadt

“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel

“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes

“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff

“Who Got Breasts First and How We Got Them” by rba

“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen

“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov

“Sawtooth Problems” by Alexander Slugworth

“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied

“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine

“Neural Networks learn Bloom Filters” by Alex Gibson

“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper

“Why You Can’t Use Your Right to Try” by Stephen Martin

“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke

“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch

“Write Cause You Have Something to Say” by Logan Riggs

“AI is Breaking Two Vulnerability Cultures” by jefftk

“Is ProgramBench Impossible?” by frmsaul

“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa

[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston

“AI #167: The Prior Restraint Era Begins” by Zvi

“Mechanistic estimation for wide random MLPs” by Jacob_Hilton

“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

“Try, even if they have you cold” by WalterL

“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck

“There is no evidence you should reapply sunscreen every 2 hours.” by Hide

“Many individual CEVs are probably quite bad” by Viliam

“x-risk-themed” by kave

“What is Anthropic?” by Zvi

“What if LLMs are mostly crystallized intelligence?” by deep

“Your rights when flying to Europe” by Yair Halberstadt

“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov

“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi

“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd

“Are you looking up?” by Craig Green

[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

“Housing Roundup #15: The War Against Renters” by Zvi

“It’s nice of you to worry about me, but I really do have a life” by Viliam

“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky

“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder

“Taking woo seriously but not literally” by Kaj_Sotala

Authentication Required