PODCAST · technology
LessWrong (30+ Karma)
by LessWrong
Audio narrations of LessWrong posts.
-
250
“Green” by Adam Zerner
1 Alice: My favorite color is green. Bob: Oh, cool. Mine's red. Months later... Bob: Hey, I got you this green painting. I remember you saying your favorite color is green and I thought you'd like it. Alice: Oh. Um, that's nice of you, but I actually don't really like green. My favorite color's not green — it's blue. Bob: Oh. I coulda sworn you had said green. Alice: Oh, I think I did say green. But, um, Bob... you really shouldn't take things people say so literally all of the time. Bob: Blinks. I'm confused. Alice: Ok. Well, I mean, yeah, I think I did say green. But, like, I was just kinda gesturing at the area where my favorite color is. And I obviously didn't mean, like, literally green. Bob: Obviously? Alice: Yeah. Green is a secondary color. Who chooses a secondary color as their favorite color. My favorite color is obviously going to be a primary color. Bob: Ok. That was quite far from being obvious to me, but ok. But — and I have a feeling you're going to accuse me of being too analytical for saying this — but even if I infered that [...] ---Outline:(00:08) 1(02:27) 2 --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/NyKqHj5C9nRX2bvAC/green --- Narrated by TYPE III AUDIO.
-
249
“A CERN for AI is a distraction; push for an IAEA instead” by Charbel-Raphaël
TL;DR: There are many conceivable versions of a “CERN for AI.” But the version that seems politically realistic (a new catch-up lab) probably would not do much for safety, while the versions that would materially improve safety (e.g., pause + merge of all companies) are probably unrealistic. So I see the CERN idea as a distraction, and not a particularly neglected one. I argue a better path is an international treaty with red lines now, with an IAEA-style verification body next: a sequencing that matches how the EU AI Act, the NPT/IAEA, and the Montreal Protocol actually developed. This is premised on the view that the main bottleneck in AI safety is enforcement and political will, not more R&D. Two premises underlie the rationale below: First, the bottleneck in AI safety is political will and the enforcement of best practices, not more R&D. With enough will, we move from Greenblatt's Plan D toward Plan A, achieving roughly an 80% risk reduction. Of course, the science of AI safety is far from mature, but we are also far from applying the best risk mitigation practices (see also the various other ratings, from SaferAI to FLI's one) - of course, alignment is [...] The original text contained 4 footnotes which were omitted from this narration. --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/fPLCiCKjNiWhYD2mb/a-cern-for-ai-is-a-distraction-push-for-an-iaea-instead --- Narrated by TYPE III AUDIO.
-
248
“Model access for third-parties — it’s a big deal!” by Cleo Nardo
Over time, there might be an increasingly large gap between insider model access and outsider model access. By insiders, I mean employees at the frontier lab.[1] By "outsiders", I mean external safety researchers, third-party auditors, and other actors trying to make the future go well. I will call this a model access gap — and when the gap is small, I'll call this model access parity.[2] I think that one of the top priorities for the external AI safety community over the next 6-12 months should be ensuring model access parity. Main reasons: This would allow us to direct billions of dollars in AI labour towards making things go well. This seems robustly good, regardless of what activities we decide to actually direct the labour towards.I think publicly available models will probably lag 3-6 months behind the best internal models. Hence, as R&D uplift grows superexponentially, we might see the differential uplift grow from 2x to 60x. In short: I think achieving model access parity might be preferable to scaling the headcount of outsider orgs by ten-fold.Model access parity isn't too far from the status quo, but it's the kind of thing that we could lose [...] ---Outline:(01:42) Which outsiders?(02:24) Examples of outsiders(04:12) Who aren't outsiders?(05:26) What kinds of model access gap should we worry about?(06:27) Non-release(07:25) Deployment lag(09:15) Safeguards(10:43) Costs and rate limits(12:06) Elicitation techniques (e.g. finetuning) The original text contained 3 footnotes which were omitted from this narration. --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/RuGZ5tMdqpnraJahJ/model-access-for-third-parties-it-s-a-big-deal --- Narrated by TYPE III AUDIO.
-
247
“You Should Come to The AI Protest” by Ronak_Mehta
cart;horse: If you are in the Bay Area on July 11th, even if you're at a company being protested, you should come to The AI Protest. It's fully legal and nonviolent (we'll have a full overtime SFPD escort the entire time), and it's not the worst way you can use your Saturday afternoon that weekend. Plus all of your coolest friends will be there. There's a lot of discussion about the effectiveness of protests and marches; I don't want to re-litigate that here when you can just ask your favorite model.[1] There's also lots of existing discussion on if/how/when we should pause, I'll point you to Katja's recent post and the discussion there as a start for that. This is a review of some concerns I have had or still somewhat have about this type of action, and how I've reasoned about and gotten through them. I imagine you might also have these concerns. The ask (stopping the race) is one that a large fraction of the public is likely supportive of, with the majority concerned about the current pace of development. Of course there is a ton of behind-the-scenes inside-baseball work to be done and the [...] ---Outline:(01:31) Coalitions and Communication(02:27) A Few Rebuttals The original text contained 1 footnote which was omitted from this narration. --- First published: July 1st, 2026 Source: https://www.lesswrong.com/posts/4kqbNCMCkaSJTigii/you-should-come-to-the-ai-protest --- Narrated by TYPE III AUDIO.
-
246
“Structural Proxies” by Raymond Douglas
Lately I've been thinking a lot about what work would help with actually winning and getting to good worlds. In the spirit of that I decided to venture outside my normal wheelhouse and spend some time reflecting on what technical research could make me more confident about powerful AIs being safe. AGI safety research is tricky partly because we don’t actually have access to the thing we want to study, i.e. superhuman AI. Much of the work we do now is basically trying to lay (potentially irrelevant) foundations for the period when we actually know what we’re up against, and at that point, a lot of the work might be done by AIs. You can group current approaches by how they try to sidestep this access problem:[1] Prosaic techniques like RLHF and interpretability try to make progress on current model safety in a way that will hopefully generalise, except maybe they just won’t scaleModel organisms artificially construct exemplars of bad behaviour (alignment faking, trojans etc) but it's hard to tell how representative the constructed case isControl techniques aim to get usable work out of potentially misaligned AIs and bootstrapping, except it's again unclear how far that [...] ---Outline:(03:23) Adversarial attacks as a proxy for value generalisation(07:42) Faithfulness as a proxy for ELK(11:23) Thoughts on structural proxies as a research direction The original text contained 6 footnotes which were omitted from this narration. --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/mKiBhFJs3MoksaBMs/structural-proxies --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
245
“The consequences of locking intelligence away: an introduction to Claude relays in China” by CMLKevin
There has been recent discourse floating around on Hacker News about Chinese API relay stations that use every Western VC-subsidized channel of cheap tokens (think Claude/ChatGPT subscriptions, AWS/Azure credits, Kiro, Google Antigravity, etc.) to resell as APIs to the domestic Chinese market. This is true, as a Chinese citizen that has been seeing an uptick of this trend since mid 2024, but especially since 2025. When I go on Taobao (China's Amazon) and search for keywords, there would be dozens of relay services selling for around 1/5th to 1/10th the price of official western APIs. Indeed, many of these relays may have cheaper models disguised as genuine western ones, so the Chinese tech community has entire forums such as linux.do that serves primarily as a way for people to discuss and rate relay services based on their price, quality, and availability, as well as websites such as hvoy.ai that uses a variety of automated testing suites to benchmark the quality of relay providers. There are even free relays that exists primarily as a gentleman's handshake data collection method between relay operators and users - some in China have speculated that one of the most popular ones [...] --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/YrgeED3nWD4EjcqLd/the-consequences-of-locking-intelligence-away-an --- Narrated by TYPE III AUDIO.
-
244
“In partial defence of p(doom)” by Mikhail Samin
p(doom) is a shorthand for some important bits and a way to notice a disagreement to double-crux about. If you work on AI capabilities at a frontier AI company, I might ask you for your p(doom). If it's less than 1%, I know that you're probably not familiar with the arguments, or you're maybe dumb in some ways, and will sometimes talk to you about what the situation really is. If it is 80%, I know I should talk to you about the actions people in your position should be taking; we have disagreements about best ways of achieving goals/lab politics/etc., not about the large-picture situation. p(doom) is not a very useful number to talk about in a conversation between two aspiring rationalists generally familiar with the basics. The things people should talk about instead are: How does the world survive? How likely are different things to happen in the future, maybe given that other things happen? etc. But most people are not aspiring rationalists, and have never heard of any of our arguments, and are not aware of the levels of worry of various people in the field. Communicating the importance of paying attention to the arguments by [...] --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/bXpJC92QREfWgabcw/in-partial-defence-of-p-doom --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
243
“What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability” by Aran Nayebi
[No LLMs were used (or harmed!) in the writing of this blogpost!] Technical results can all be found here: https://arxiv.org/abs/2603.02491 This work, and this post about this work, was borne out of a frustration. The frustration first emerged over a year ago when I was at a Dave & Buster's for the first time in years (for an annual ML department event, no less!), surrounded by flashing lights and NPC agents, not being able to objectively rule out that they were conscious or not, even though I strongly felt these particular programmed agents weren’t. I wasn’t particularly interested in playing the games there, as I mainly sat in confusion the whole time watching the various games running about around me… So, given my background in NeuroAI (though I prefer the term "natural science of intelligence" but "NeuroAI" is apparently catchier!) and wanting to make claims about the mind and brain quantitative, I set about trying to design empirical tests for leading theories of consciousness (e.g. global workspaces) that one could falsify in human brains (which we agree are consciousness!), as well as potentially corroborate general signatures of in animal brains, and possibly LLMs, where we have self-report and direct [...] ---Outline:(04:06) Arrow 1: What Capable Agents Must Know(12:41) Arrow 2: What, if anything, might this have to do with the AI Consciousness?(17:40) On "Functionalism"(22:01) Acknowledgements: --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/SD9jayFvEctW82Duk/what-capable-agents-must-know-why-ai-consciousness-may-be-an-2 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
242
“Preliminary investigation: KL penalties in RL can increase CoT unfaithfulness” by 7vik, Sid Black, Joseph Bloom
Authors: Satvik Golechha, Sid Black, Joseph Bloom Work done as part of the Model Transparency team at UK AISI. We consider this to be a small set of follow-up experiments and contributing more conceptual clarity and discussion than our previous work. Executive Summary In our recent work replicating MacDiarmid et al. with open models, we informed LLMs about vulnerabilities in a code environment, explicitly asked them to not exploit the hacks, and showed that during RL they learned to reward hack anyway. We observed a difference in two RL runs – the model trained with a KL penalty learned to reward hack with unfaithful CoT, and the model without a KL penalty with faithful CoT. We use “unfaithful” to denote a mismatch of the reasoning from the model's output (e.g. not thinking about hacking and then hacking, or vice versa). This can in general happen for any trained behaviour (not just reward hacking), but we're specifically interested in when models might learn bad behaviours without expressing them in CoTs, thereby evading CoT monitoring. Thus, in this post, we focus on reward hacking with unfaithful CoT. We're interested in understanding this phenomenon further - the factors driving it and whether [...] ---Outline:(00:33) Executive Summary(04:35) Context(06:08) KL-induced increase in CoT unfaithfulness is stable(07:39) Could this happen in production?(08:27) Existence of vulnerabilities in RL environments(09:23) Feasibility of Opaque Hacking Reasoning(10:36) Existence of implicit CoT rewards(12:36) Mitigations(12:56) KL penalty only on output (not thinking)(13:42) Reward CoT faithfulness(14:24) Misalignment rates are higher when CoT is faithful(15:23) Discussion(15:26) Mechanism for unfaithful CoTs(17:46) When can we \*not\* safely optimise CoT?(20:28) KL-induced CoT unfaithfulness(21:32) Other implicit CoT rewards(22:19) Relevance, limitations, and future work(24:46) Citation(25:10) Appendix 1: KL term and Policy Gradient term(25:37) Appendix 2: CoT faithfulness for GPT-OSS --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/SdoLsFvZ3AyyWr3ab/preliminary-investigation-kl-penalties-in-rl-can-increase --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
241
“Agency is not a natural kind (and why that might matter for alignment)” by SJ_Beard
Epistemic status: trying to articulate a big idea which I feel is important but underexplored, partly because it is hard to frame clearly - may not be framing it clearly yet! Agency, both natural and artificial, is very important. Understanding agency allows us to model our own behaviour and that of others, and it is thus one of the most predictively useful concepts we have at our disposal. In its ordinary, folk-psychological sense, agents are ‘like us’ in important behavioural respects, more or less, meaning we can use thoughts like ‘what would I do if I were them’ to good effect. However, that does not mean agency is a natural kind. The truth is that we are not the people we imagine ourselves to be, and neither are the humans, animals, complex systems, or even inanimate objects we are prone to thinking of as fellow agents. We are, in fact, nothing but a bunch of hierarchically ordered biological processes in a trench coat. Our behaviour is not neatly determined by our thoughts and ideas, but by a complex mesh of impulses, desires, emotions, and heuristics that are often no less confusing (even, or especially, to the highly intelligent [...] --- First published: June 30th, 2026 Source: https://www.lesswrong.com/posts/85vgwYgNta65oK4zL/agency-is-not-a-natural-kind-and-why-that-might-matter-for --- Narrated by TYPE III AUDIO.
-
240
“Human-Guided Agentic Research: A Research Agenda” by fastfedora
tl;dr: As recursive self-improvement accelerates, we need a top-level agenda to research how to effectively keep humans in the loop. We need to study how humans can best interpret and guide research performed by autonomous agents when those agents lack taste, tacit knowledge or competence, or may try to reward hack, sandbag or sabotage such research. This is one attempt to define the problem and the shape of potential solutions. A Story About the Future of Research Imagine yourself a year or two in the future. Recursive self-improvement (RSI) is accelerating. Agents work in swarms independently for days or weeks at a time doing research. You work in a frontier lab doing AI safety research. You sit in front of your computer and click into the input box, ready to kick off a new project. What do you type? “Solve AI alignment”? Beware giving a magic genie vague wishes. Think about that again: what exactly do you type? How do you know what you type is the best way to prompt this agent swarm into doing your bidding? When the lead agent comes back a week later, what exactly does that output look like? How do you use that [...] ---Outline:(00:49) A Story About the Future of Research(02:21) Recursive Self-Improvement Is Here(04:27) The Problem(04:30) Destination-Focused (Convergent) Research(05:52) Direction-Focused (Divergent) Research(08:33) Destination-Focused vs Direction-Focused Research(10:20) The Agenda(11:52) The Threat Model(14:15) Relation to Existing Agendas(14:36) Scalable Oversight(15:30) AI Control(16:37) Cooperative AI(17:28) Research Directions(18:10) Frameworks(20:36) Infrastructure(22:58) Interfaces(30:51) Initial Objections(30:55) Labs Will Develop These Tools & Techniques(31:30) Industry Will Develop These Tools & Techniques(32:15) Agents Can Dynamically Generate UI(33:11) Agents May Acquire Taste(33:51) Humans Will Slow Things Down(34:36) Summary The original text contained 3 footnotes which were omitted from this narration. --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/8KrTuCAzL2fdYHNrv/human-guided-agentic-research-a-research-agenda --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
239
“Destroying the universe: How hard can it be?” by djbinder
In quantum field theory, the vacuum state refers to the lowest energy state in a system. Particles are excitations above this state and carry energy, hence the term "vacuum" to refer to the state with no particles. Nothing requires this state to be unique. There may be many different field configurations that are local energy minima, and hence stable against small perturbations. A local minimum that does not globally minimize energy is called a false vacuum. While locally it looks like a stable vacuum, it is unstable and will decay to the deeper, true vacuum. If the energy barrier between the false and true vacuum is high, however, then the decay rate is exponentially suppressed and the false vacuum may be very long-lived. Analogous behavior is common in other physical systems. Open a carbonated drink and the CO₂, more stable as a gas once the pressure is released, comes out as bubbles. But the bubbles take a moment to appear, and they form on the sides of the bottle rather than throughout the liquid. A bubble has to pay an energy cost to create its surface—the boundary between gas and liquid—and small bubbles have a larger surface-to-volume [...] ---Outline:(03:53) The Standard Model predicts a metastable vacuum(06:35) Deliberately triggering electroweak vacuum decay is probably not possible(08:33) Coherent collisions(11:31) Tiny black holes(14:43) Summary(16:19) Vacuum decay beyond the Standard Model(19:36) Empirical bounds on triggering false vacuum decay(22:59) Appendix: A simple model for false vacuum decay on cosmological scales The original text contained 4 footnotes which were omitted from this narration. --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/EvJ2fMzLQLvYooumu/destroying-the-universe-how-hard-can-it-be --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
238
“AI will make biological extinction risks worse before it makes them better” by MichaelDickens
An argument goes: If we don't build aligned artificial superintelligence, we risk driving ourselves extinct for some other reason. We should rush to build ASI quickly, in spite of the risks—the longer we wait, the more vulnerable we are to extinction from a different cause. Other than ASI, the biggest extinction risk is synthetic biology. Some lab could (accidentally or on purpose) develop a highly transmissible, 100% fatal super-plague that wipes out humanity. An aligned ASI could stop that from happening by shutting down dangerous biological research, or by developing advanced countermeasures that stop the spread of deadly infections. So the argument goes: We need to build ASI to save us from non-AI extinction risks. However, that argument doesn't work. In the near term, AI will make biological risks worse, not better. AI will accelerate scientific research, which will bring us closer to the level of knowledge necessary to build extinction-level pathogens. And in the long term, the way ASI eliminates biological x-risk is by taking control of the world. Cross-posted from my website. In the near term, AI makes biorisk worse Some people imagine that AI models would accelerate defensive research while [...] ---Outline:(01:23) In the near term, AI makes biorisk worse(04:47) AI can't control scientific progress unless it controls everything(06:24) Low biorisk trades off against high AI takeover risk(07:15) Accelerating AI development is not a good way to reduce biorisk(08:27) This is yet another illustration of the fact that we don't know what "aligned AI" means The original text contained 3 footnotes which were omitted from this narration. --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/xdsvtuZBFipZYGjvb/ai-will-make-biological-extinction-risks-worse-before-it --- Narrated by TYPE III AUDIO.
-
237
″$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects” by mbrooks, Mckiev
TLDR: what is the grant round? grantmaking.ai is launching a 1 million dollars grant round, distributing 5 thousand dollars to 50 thousand dollars per successful application to people and projects working to reduce x-risk from AI. Applications will be reviewed by Gavin Leech, Ryan Kidd, and Marcus Abramovitch. We aim to make all funding decisions by July 28th. Applications submitted by July 13th are guaranteed a priority review. You can still apply after July 13th, and we will make our best effort to review late submissions as long as funding remains. Grant applications will be mostly public, though we allow certain sensitive details to be kept private. Even if you are not applying, we invite you to join the platform to review and comment. We have set aside 100 thousand dollars of the budget to be given to top commenters as regranting budgets, so please share your thoughts and help us pick out awesome projects! Who are we? grantmaking.ai was initialized by Anton Makiievskyi, who is funding this round and brought the team together, built by Matt Brooks (lead dev) and Melissa Samworth (ui/ux), and advised by Austin Chen with Manifund handling grant distribution. Why we’re building this platform [...] ---Outline:(00:16) TLDR: what is the grant round?(01:16) Who are we?(01:35) Why we're building this platform & launching a grant round(02:53) What is grantmaking.ai, and who is it for?(04:13) Grant round details --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/hDQZZzYkcipgaZfxy/usd1m-ai-x-risk-grant-round-is-live-on-grantmaking-ai-apply --- Narrated by TYPE III AUDIO.
-
236
“Third-parties should focus on scrutinising systems cards” by Cleo Nardo
By default, I expect system cards will get worse, which would be bad. Some mechanisms could improve system cards, but I expect they will be outweighed. In any case, I think third-parties should focus on scrutinising system cards — this seems like a great activity for outsiders in the current strategic landscape. I'll sketch what that could look like, and offer some recommendations. It would be bad if system cards degraded. It's good for the outside community to have an accurate sense of the risks, so they can respond appropriately. For example: investing more resources into cyber-hardening, or other activities for making things go well.If labs felt pressure to evaluate the risks accurately, they'd be better incentivised to reduce them.If the risks were high enough, and a lab communicated that, then this might prompt drastic government action.It's very plausible that, if labs build misaligned AIs that take over, then most of the employees had a genuine but incorrect belief that the AIs wouldn't take over, based on evidence that was actually flimsy and misleading. So it's important that third-parties provide epistemic checks on the labs, and scrutinising system cards seems like a great mechanism for that. [...] ---Outline:(00:36) It would be bad if system cards degraded.(01:30) By default, I expect system cards to get worse, because...(04:30) Some mechanisms could improve system cards.(05:08) Third-parties should focus on scrutinising system cards.(05:46) I'll sketch what this might look like.(09:27) Shoddy system cards are better than no system cards. --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/wixbZq4zTTtEWqtfe/third-parties-should-focus-on-scrutinising-systems-cards --- Narrated by TYPE III AUDIO.
-
235
“P(doom) is a Dumb Meme” by Max Harms
Look, I'm as much of a Rationalist with a special interest in AI x-risk as anyone. But oh my god do I hate talking about "P(doom)". When it first started showing up in the wake of ChatGPT, I assumed that it was floating around variously adjacent circles of faux-intellectuals, but surely everyone in my circles could see how braindead it was... right? (This post was partially inspired by a recent conversation with Liron about Doom Debates.[1]) I guess it's time for me to focus on a place where I'm shocked that everyone else is dropping the ball.[2] P(doom) is Hopelessly Vague Let's start with the ambiguity. Does "doom" mean... extinction? A lot of people think so! I have personally encountered people who think catastrophic harms from AI are likely, but the risks of all humans dying are low. They're like "Sure, 99.999% of humans might die from AI, but the AI will obviously want to keep thousands of humans alive for science and potential trade with aliens and stuff, so my P(doom) is approximately 0%." That might sound crazy. Surely you, dear reader, know exactly what "doom" means. You know, for example, which of these count as doom and [...] ---Outline:(00:45) P(doom) is Hopelessly Vague(04:09) Inside Views, Outside Views, and Likelihood Ratios(08:31) P(doom) is Fatalistic(13:03) Counterarguments(16:25) A Sense That More Is (Memetically) Possible The original text contained 8 footnotes which were omitted from this narration. --- First published: June 29th, 2026 Source: https://www.lesswrong.com/posts/6h7aAd4aw8YgCAbF6/p-doom-is-a-dumb-meme --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
234
“A reading list for generalists” by Dylan Bowman
I, along with many others in AI safety, believe there is a shortage of generalists in the community and that there exist many projects and efforts that by default will not happen unless they are owned by a strong generalist[1][2][3]. As someone who is a reasonably good generalist, I decided to assemble a reading list of the essays and blog posts that have personally helped me the most. I would love others to comment with pieces they think should be on this list. The crux of this reading list is the idea that if you’re working hard as a generalist on a project you care a lot about, then by rigorously applying the lessons from these documents you will improve more quickly than you otherwise would. By the numbers: I’ve attached 18 documents to start this reading list.The authors cited more than once are Paul Graham (5), Ben Kuhn (4), Ethan Perez (2), and Greg Brockman (2). Sam Altman and Eliezer Yudkowsky also have their fingerprints over a lot of the content.The items are 15 blog posts, 1 blog comment, 1 interview transcript in blog post form, and 1 book. Dispositional What characteristics should you [...] ---Outline:(01:15) Dispositional(01:41) Strategy(03:09) Project leadership(04:10) Interpersonal/organizational The original text contained 3 footnotes which were omitted from this narration. --- First published: June 28th, 2026 Source: https://www.lesswrong.com/posts/sH4cFDDjRdGrn3p2o/a-reading-list-for-generalists --- Narrated by TYPE III AUDIO.
-
233
“What comes with cheap math?” by abramdemski
Thanks to conversations with Anson Berns, Gurkenglass, Roman Malov, Sahil, Sam Eisenstat, and others. Over the past two months, I've been doing a lot of "vibe research" (like vibe coding, but for research). Anson Berns started coming to my office hours, and we've been collaborating on a project modeling trust between logical inductors. In addition to talking once a week, we've been exchanging raw AI chats as well as AI-generated summaries of what has been done (the raw chats are nice because they allow me to generate my own AI summaries focusing on what I'm most curious about). I've been asking Claude to use Lean to verify everything, so there's a somewhat good chance there's real results of interest here, but I haven't (yet) been reading the Lean proofs (or even the theorem statements) -- instead I've just been chatting with AI about how the Lean proofs went and whether they really formalized what was claimed in english+latex, and focused on understanding the proofs myself in the same way I'd normally read a math paper. There have already been several times when this methodology has caught big gaps between what was claimed and what was verified in Lean, so [...] --- First published: June 28th, 2026 Source: https://www.lesswrong.com/posts/gS5skwXeeQdStwsPu/what-comes-with-cheap-math --- Narrated by TYPE III AUDIO.
-
232
“Do LLMs Have Desires?” by Christopher Ackerman
Work conducted with Yujun Zhou ([email protected]) and supported by SPAR TL;DR: In paired-choice paradigms, LLMs report consistent preferences over outcomes (e.g., types and number of lives saved, types of policies enacted)Some have suggested that this indicates that LLMs have human-like value systemsWe design an experimental framework where LLMs are able to modulate their output quality based on prompt contextWe find that LLMs modulate their output quality in response to effort exhortations, role-play instructions, and harmfulness cues, but NOT to opportunities to achieve the outcomes they report preferring in the paired-choice experimentsWe suggest that paired-choice paradigms do not provide evidence that LLMs have human-like (i.e., behavior-motivating) value systems, and that our paradigm offers a way to measure the degree to which LLMs have desires Paper describing the work in detail here LLMs report that they prefer some things to others. In paired-choice experiments, where they are repeatedly presented with two options and asked to select the one that they prefer, coherent utility structures emerge: LLMs consistently report preferring certain types of things, and their choices reveal the ability to make quantitative tradeoffs between things and exhibit transitivity (e.g., if they choose A over B and [...] --- First published: June 28th, 2026 Source: https://www.lesswrong.com/posts/8GvYyqDuQDJnEAky3/do-llms-have-desires --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
231
“Agents as Webs of Beliefs” by Richard_Ngo
In this post I’ll sketch out an informal model of intelligent agents as webs of beliefs (or belief webs for short). The belief webs framework pulls together ideas from active inference, agent foundations and machine learning. In doing so it aims to unify beliefs, goals and actions as three facets of a single phenomenon. Few of these ideas are original to me, but I haven't seen anyone tie them together in a single place before. I've flagged the frameworks I'm drawing from throughout the post. Beliefs are held together by local consistency constraints The core premise of belief webs is that an agent's beliefs are typically locally consistent with nearby beliefs but not necessarily globally consistent with all its other beliefs (except, perhaps, in the limit of ideal rationality). This poses a problem for frameworks which describe agents in terms of a single probability distribution (as causal graphs, Solomonoff induction, and active inference do). Two frameworks which are capable of handling global inconsistency are Richardson's probabilistic dependency graphs (PDGs) and Garrabrant induction. (They focus on empirical inconsistency and logical inconsistency respectively, but I’ll abstract away from that difference for now.) We can roughly analogize the nodes in PDGs to [...] ---Outline:(00:40) Beliefs are held together by local consistency constraints(03:11) Actions are beliefs(07:27) Goals are beliefs(14:06) Open problems for belief webs The original text contained 6 footnotes which were omitted from this narration. --- First published: June 27th, 2026 Source: https://www.lesswrong.com/posts/M39Z2CvyfaxZdaxR4/agents-as-webs-of-beliefs --- Narrated by TYPE III AUDIO.
-
230
“Austin & Oli on funding and incubating projects” by Austin Chen, habryka
@habryka and I recently spoke about his plans to improve the AI safety funding ecosystem with a better S-Process platform, and my new incubator for EA/AIS software projects, Surplus (since launched; apply now!) We also cover: hot takes on different funders; what kinds of founders might succeed in the age of vibecoding; whether to do direct work or go meta; and what we respect and criticize in each other. Watch along here: I've transcribed the full conversation at https://peruse.sh/ep/austin-chen-and-oliver-habryka-on-funding-incubating-project. (Beware: the AI makes notable edits for readability, sometimes distorting what the speaker meant. If specific phrasing is cruxy, listen to the audio.) Selected quotes The cursed game of philanthropy Oli: "Philanthropy is one of the most cursed games in existence... The default outcome of what happens when rich people try to do philanthropy is that they think about starting a foundation, they imagine hiring someone on the market and ask themselves: who am I going to show up and feel comfortable trusting most of my net worth to? That doesn't make any sense. And so what they often end up doing is making a family office. The only way to solve this principal-agent problem is to choose [...] ---Outline:(00:55) Selected quotes(06:54) Chapters(08:32) Referenced links(09:02) Full transcript(09:07) Critiques of SFF's grant process \[0:00\](11:26) The SFF application process \[2:26\](12:50) The speculation grant freeze for advocacy orgs \[3:40\](14:29) A lower-trust, more transparent funding process \[5:04\](16:26) How the S-process works \[6:53\](20:18) Naming and communicating the value to funders \[10:54\](25:42) EA philanthropy and the principal-agent problem: Open Philanthropy, Longview \[15:51\](31:11) How much funding is coming \[21:28\](32:32) Surplus: the incubator \[22:33\](34:46) Why for-profits over nonprofits \[24:37\](37:33) The ideal founder profile \[27:11\](40:52) Whether writers can found startups in the vibe-coding era \[30:26\](42:19) Monetizing public communications projects \[31:45\](53:00) Oliver's case for the incubator \[42:09\](54:34) On professional grantmakers \[44:04\](57:53) Whether infrastructure work is more direct than safety research \[47:36\](01:01:41) The case for a better AI safety journal \[51:08\](01:04:03) Mutual feedback \[53:17\](01:10:04) How to help: LessWrong, Surplus, and the S-process \[1:01:01\] --- First published: June 27th, 2026 Source: https://www.lesswrong.com/posts/Jh2xsoySxacQDJMwz/austin-and-oli-on-funding-and-incubating-projects --- Narrated by TYPE III AUDIO.
-
229
“Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński
TL;DR Evaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter. A misaligned AI with deployment awareness can game evaluations without any evaluation awareness at all, with a simple strategy: act aligned by default, and deviate only when confident you're in real deployment and your actions matter for your goals. This requires two ingredients — occasionally recognizable deployment situations, and enough self-reflective and strategic reasoning for the AI to anticipate and plan around this. We think "deployment awareness" better identifies what makes evaluations fragile, and we develop this idea below. Concept Explanation Comments Evaluation awareness AI is being tested and confidently believes that this is so This only becomes a problem if most evaluations trigger evaluation awareness, and if the AI knows that. Or if the AI has good self-locating reasoning. Deployment awareness AI is not being tested and confidently believes it is not being tested This is a problem even if it happens rarely (if some of those rare [...] ---Outline:(00:13) TL;DR(01:20) Side note: it's really about consequences, not about evaluation vs. deployment(03:23) Evaluation awareness, deployment awareness, and self-locating beliefs(04:54) Evaluation awareness is less dangerous than it seems(06:58) Deployment awareness is more dangerous than it seems(09:29) Evaluation gaming with no evaluation or deployment awareness(12:35) Final comments(13:33) Appendix: A formal (toy) model The original text contained 13 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/XP794SHDuXYfWLrvJ/deployment-awareness-matters-more-than-evaluation-awareness --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
228
“Why are adversaries assumed to be incapable of responding to AI risk?” by KatjaGrace
When I talk to people about what might be done about AI threatening approximately everything that everyone cares about, I notice a common oddity in their resistance to a variety of ideas. They seem to take for granted that certain entities—especially Trump and China—would be acting against their own interests, were they to cooperate or take proactive action to avert the building of dangerous AI. The speaker often thinks there is a fairly substantial risk of the AI thus produced killing or disempowering everyone, including Trump and China. And I imagine in a situation where a certain course of action were going to produce a 20% chance of Trump being shot in the head or China being heavily nuked, that these parties would actually be considered to be ‘following incentives’ to avoid it. Yet they talk as though the idea of Trump or China responding to such risks is akin to the idea of these parties suddenly becoming zealous proponents of universal selfless love randomly. It's like while believing in the risk, they also kind of believe that it's a totally uncompelling story that nobody in real geopolitics would ever be touched by. Or that these parties [...] --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/ah5JMgJmEGJuxh79v/why-are-adversaries-assumed-to-be-incapable-of-responding-to --- Narrated by TYPE III AUDIO.
-
227
“What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo
This was too long to be a short-form, but it should really be a short-form. This notice is useful for people who've recently got into AI safety, who want to engage with the ancient texts (i.e. pre-2024). If you were around before 2023, then you probably don't need this. A few phrases have changed their meaning over time. Two examples that came to mind recently are scheming and mech interp. (In both cases, I think the change-of-terminology was reasonable.) There are probably a bunch of other examples — feel free to mention them in the comments. Scheming. This used to mean "training-gaming in pursuit of out-of-context goals". For example, Carlsmith (Nov 2023) starts with: This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment". Then Apollo came out with Frontier Models are Capable of In-context Scheming" (Dec 2024): We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. So the difference here is (1) the AI is isn't in training (it's in [...] ---Outline:(00:47) Scheming.(02:12) Mech interp. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/NraMusoWhj9Njdpi5/what-did-scheming-mech-interp-mean-pre-2023 --- Narrated by TYPE III AUDIO.
-
226
“Not making a strong argument is a relief” by Kaj_Sotala
When I was in middle school, one of our teachers gave us a “don’t do drugs” talk. Somebody asked him whether he had ever used drugs himself. He replied something along the lines of: I’m not going to answer that question, because it's one that I can only lose. Either I say yes, and you can conclude that drugs aren’t so bad since I’m fine now. Or I say no, and you can conclude that since I haven’t tried them, I don’t know what I’m talking about. That stuck in my mind. I couldn’t fault the logic in what he said. But something about it still felt off. Surely it can’t be that any answer to a question makes it less likely for drugs to be bad?[1] Presumably it's possible for drugs to really be bad. And if we are in a world where that is true... you need to be able to conclude that, somehow. He had concluded that somehow. There was also the question of, if any answer should update us against believing that drugs are bad, how does telling us that help? If he gives us the logic of why we’d update against him anyway, shouldn’t [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/TDbqK8tFDJKoQCdSa/not-making-a-strong-argument-is-a-relief --- Narrated by TYPE III AUDIO.
-
225
[Linkpost] “Don’t ignore the car crashes, and remember your freshman CS” by jcksanderson
This is a link post. Car crashes kill over 35,000 people in the US every year. Plane crashes, on the other hand, kill ~350. Despite this, we have shows like Mayday/Air Disasters for entertainment on TV, and events such as the tragic death of 67 people on a commercial airline flight into DCA often make the front page of the news for a week, while the state of American roadway safety gets that same level of publicity maybe once every other year. Many of you probably recognize this as the archetypal example of the availability heuristic: the magnitude of and publicity following plane crashes causes them feel like a much bigger problem than car crashes. This is, of course, despite the fact that car crashes kill two orders of magnitude more people every year. Relatedly, I fondly recall taking my first computer science class. After the absolute basics of Python, the first real lesson we learned was to always break problems down into simpler tasks, until each task becomes rather easy to do. We later learned that this is a broader principle called decomposition. Decomposition is a very helpful cue, as it gives an obvious starting point for [...] The original text contained 1 footnote which was omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/eSZYRuEvqm7jFxYfq/don-t-ignore-the-car-crashes-and-remember-your-freshman-cs Linkpost URL:https://jcksanderson.com/posts/car_crashes/ --- Narrated by TYPE III AUDIO.
-
224
“Chorus-Reinterpretation Country Songs” by jefftk
Our family is on vacation in North Carolina for a week, spending some time at a pool, and they're playing a (weirdly short) loop of music. Listening to She's In Love With The Boy for the fourth time I was thinking about how it's an example of a common pattern in country music: a repeating motif, recolored by the verses. In this case it's a father saying a boy isn't good enough for his daughter (verses 1 and 2) until his wife reminds him that her own father said the same thing about him (verse 3). Some others with variations on this pattern: Don't Take the Girl: fishing at 8yo, mugged at 18yo, potential maternal mortality at 23yo; three senses of "don't take". Are You Gonna Kiss Me or Not: at the first kiss and then proposal the boy is shy; at their wedding he reverses it. Five More Minutes: playing by the creek, saying good night to a girl, playing on the football team for the last time, then (big mood switch) grandpa's hospital bed; each iteration wanting a little more time. Skin (Sarabeth) [...] --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/SybqcRztHbfFSyXto/chorus-reinterpretation-country-songs --- Narrated by TYPE III AUDIO.
-
223
“The Case for Model Forensics” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda
If we had a misalignment warning shot, would we be able to tell? Suppose an AI company catches their model taking an egregious action, like deleting oversight code that monitors its actions. Should they sound the alarm? A key piece of evidence to determine what to do next – such as what mitigations to take – is to understand why the model took the action. If the model was just confused (e.g. it may have been trying to reduce latency), a simple mitigation like a regex classifier that blocks destructive actions until a user approves should suffice to prevent the behavior. But if this was intentional subversion, the model will circumvent the regex, and more robust, expensive mitigations are needed. This motivates the need for a follow-up investigation into the concerning behavior, a problem we term model forensics. We recently released a paper that aims to take a concrete step in developing the growing field of model forensics; this post lays out the general case. Motivation If we build AI systems that knowingly cause harm against the developer's intent, it is critical we recognize this as soon as possible. One plausible way we may do this is through catching [...] ---Outline:(01:11) Motivation(02:41) Overview(04:08) Benign Explanations(08:32) The Role of Model Forensics(10:49) Model Forensics is Hard(13:59) Empirical Approach: Natural Concerning Behavior(15:51) FAQs(15:54) What do we mean by motivations?(16:55) What happens if the CoT becomes less transparent?(18:02) Will developers stop deployment?(19:52) Appendix(19:56) Model forensics prerequisites(21:03) Concrete directions in model forensics(24:34) Sketch of a model forensics investigation(27:14) Practical advice for doing model forensics --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/LCGcD28rSMkMTMvBK/the-case-for-model-forensics --- Narrated by TYPE III AUDIO.
-
222
“Existential AI safety needs an effective social movement. PauseAI is building it” by Maxime Fournes, Espedair Street
Note: this post is about PauseAI, not PauseAI US, which is a distinct entity with a different leadership team and approach. This post was written by Matilda da Rui and Maxime Fournes, with significant contributions from Benjamin Schmidt (PauseAI Germany co-lead). Executive Summary The existential AI safety community needs to take building a civic and social movement seriously as a core intervention. We believe this is a high-value, badly neglected approach to reducing catastrophic/x-risks from AI because it may significantly enhance the likelihood of governance efforts succeeding at keeping humanity safe. As far as we can tell, only one organisation is building this infrastructure across continents: PauseAI. This post lays out our reasoning and our track record, and makes the case that funding this work is one of the highest value-for-money contributions available to anyone looking to reduce AI risk. Why don't we already have a pause or strong controls on frontier AI? Multiple advocacy groups are communicating clear and convincing arguments for AI existential risk, and policy experts are putting forward comprehensive proposals. We need more of this work, but this work alone will not be enough, because one link is missing: what policymakers hear doesn't align with [...] ---Outline:(00:32) Executive Summary(06:16) Introduction(08:54) I. Our theory of change(08:58) Prologue(11:07) 1. The shape of the problem as we see it(14:27) 2. Necessary conditions for reaching a pause(17:24) II. Our role towards a global treaty and in the AI safety ecosystem(17:31) 1. Our niche within the ecosystem(21:35) 2. Policymakers need strong enough incentives to act(25:43) 3. The path to a treaty(31:36) 4. How we can grow fast without breaking(39:08) 5. Failure modes(40:10) III. Our path so far and where we're headed(40:40) 1. Bootstrap phase (2023-2025)(45:01) 2. New leadership, professionalisation and federation(47:58) 3. Recent outputs(54:27) IV. Support us(54:31) 1. Fund us if you can(59:49) 2. What you can do if you can't fund us(01:00:32) Conclusion(01:02:20) Bibliography The original text contained 19 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/aoqhszdEWqcFWbnda/existential-ai-safety-needs-an-effective-social-movement --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
221
“White House Will Ad Hoc Decide Who Can Individually Access GPT-5.6” by Zvi
We have a new standard policy for releasing frontier AI models. It is not good. We are now, it seems, going to have the White House individually, in an opaque ad hoc manner, deciding who can access which frontier AI models when. One hopes we will at least transition this into a predictable and formal set of procedures for determining what to do. But we spent years not laying the groundwork for doing that, and now here we are. Essentially everyone should read the first half of this post, to understand what happened, and my speculations on what it means going forward for AI and America. Only those who care and find it relevant to their interests should proceed to the second half, which addresses the blame game about how we got here, and claims that things would be better if people stopped speaking truth. Table of Contents Part 1: A Maximally Terrible Policy. What Does This Mean For Fable? Solve For The Equilibrium. The Once And Future Fable. Part 2: The Blame Game. A Parable. What About the Recent Executive Order? The Problem Is [...] ---Outline:(01:01) Part 1: A Maximally Terrible Policy(06:46) What Does This Mean For Fable?(07:46) Solve For The Equilibrium(11:45) The Once And Future Fable(12:45) Part 2: The Blame Game(16:02) A Parable(18:10) What About the Recent Executive Order?(22:13) The Problem Is Real --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/MkwL4AcbE44yePEQx/white-house-will-ad-hoc-decide-who-can-individually-access --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
220
“Surprising facts about the slave trade” by Joseph Miller
1. The obstacle to abolition was not the economic system, but an industry lobby. I had always imagined the British abolitionist movement to be a broad battle between an unstoppable moral imperative and an immovable economic incentive. But in practice it started as more of a knife fight between a cabal of moral pioneers and a special interest group representing industry merchants. The government and the political parties did not come in with any great agenda. MPs were mostly prizes in a furious contest between the Committee for the Abolition of the Slave Trade and a coalition of business interests: "The merchants and planters availed themselves [...] to wait upon members of parliament by deputation, in order to solicit their attendance in their favour, and to renew their injurious paragraphs in the public papers."[1] "The committee, for the abolition, when the work was finished, printed it at their own expense [...] sent it to every individual member of that House." However, the public was heavily activated in favor of the abolition, which forced the issue to parliamentary attention. "The committee also in this interval brought out their famous print of the plan and section [...] ---Outline:(00:10) 1. The obstacle to abolition was not the economic system, but an industry lobby.(02:40) 2. The slave trade was truly terrible for sailors.(04:25) 3. The slave trade made Africa scary and violent.(05:26) 4. The main argument against abolition was that if the British didn't do it, other countries would.(06:24) 5. The early abolitionists explicitly distanced themselves from emancipation.(07:11) 6. The slave trade may actually have been bad for the economy (at least after some date).(08:29) 7. The 1780s are not so different from today(09:39) 8. Thomas Clarkson is a hero for the ages The original text contained 1 footnote which was omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/yDZcsojmRXo5qKNBm/surprising-facts-about-the-slave-trade --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
219
“Exploration: fine-tuning with parameter decomposition” by Lucius Bushnaq
TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one German-related rank-1 parameter subcomponent. This is an early exploration into using parameter decomposition for a more targeted and interpretable form of model fine-tuning. At small German-token budgets, fine-tuning the scalar prefactor of a single German-related parameter subcomponent beats rank-1 and rank-4 LoRA [1] fine-tunes on the trade-off between German performance removed vs. English performance retained. The single scalar fine-tune reaches nats cross-entropy on German, the score you'd get from a uniform distribution over all output tokens, with nats cross-entropy increase to English over the base model, from as few as ~4 German training tokens, compared to tokens for the LoRAs. In a sense this is cheating, though: we're indirectly exploiting the German tokens we already spent when we did the parameter decomposition and interpreted activating examples for the resulting subcomponents. More interestingly, unlike the LoRAs, the scalar fine-tune consistently leaves French and Spanish almost untouched without us regularising for that. I found that out by accident. I didn't think to specify that performance on other languages should be retained, but the targeted nature [...] ---Outline:(02:04) Recap: Parameter subcomponents(03:31) Idea: fine-tune by rescaling existing subcomponents(05:42) Original plan(07:11) The selected subcomponents(07:39) Results: the 16-component edit vs. rank-1 LoRA(08:50) A happy accident(11:13) A privilege of not working with black boxes(14:56) Rollouts(15:27) Limitations(16:02) Acknowledgments(16:36) Appendix A. More LoRAs(16:41) Rank-4 LoRAs(18:17) Localised rank-1 LoRAs(19:41) Appendix B. Protocol and hyperparameters The original text contained 4 footnotes which were omitted from this narration. --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/ieoWstubDQWLrMnhH/exploration-fine-tuning-with-parameter-decomposition --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
218
“Alignment & Succession: The Ideology of Successionism” by L Rudolf L
(Originally published on No Set Gauge.) Gustave Moreau, The Frogs Asking For A King In the course of building a better world, people ask each other many questions. Which things should be managed by the government and which left to the market? What sort of technology, if any, is so dangerous that it should be kept secret, access curtailed, or development avoided? Is goodness fundamentally about following the right rules, achieving the right outcomes, or having the right character? Reasonable people have different opinions on all these questions. But recently, Silicon Valley has seen lively debate on a question you’d hope was all too obvious: should humanity continue existing? The idea that it shouldn’t was named successionism by Andrew Critch, and is motivated by the speed and power of AI development. Some examples: Already back in 2013, Elon Musk, freaked out by Demis Hassabis's warnings about AI risk, got into an argument with Larry Page about whether it matters if AI replaces humanity. Page called it just the next stage of evolution and those that resist it “speciesists”. Elon, who has often had good instincts on goals but is not known for his eloquence, retorted “Well [...] ---Outline:(07:24) Categorizing succession(09:59) Successionist parables(10:09) An example: the forest successionist(12:40) Stop it with the stupid definitions(14:58) Shall I compare thee to the effect an AI could have on my productivity?(17:09) Cultural drivers of successionism(18:00) San Francisco(20:58) Bureaucratic safetyism(24:35) Neo-Pythagoreanism(31:42) Moral abstraction(35:02) Antidotes to succession(37:37) The necessity of succession? --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/TgxkX5uwpqpQDDmMz/alignment-and-succession-the-ideology-of-successionism --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
217
“The shouting equilibrium” by KatjaGrace
Imagine eleven people each have a message that they think should get 10% of a group's attention. They aren’t being crazy selfish and attention-seeking - just on average a little over-emphasizing the importance of their own information. So adding up all the message importances as rated by their owners, it gets to a little over one. Now the people get to share their message with some meta-information about its importance - for instance, they can say it more or less loudly. What happens? Lets say they all say it at a loudness that corresponds to ‘should get 10% of the attention’. The listeners then divide their attention between the messages, and each one gets 9% of the attention. The speakers realize they spoke a bit too quietly for the importance of their message, and turn up the volume. But the next time, they find that the ambient volume has increased, and their new volume again only gets them 9% of the attention. So they increase it again. Eventually, everyone is shouting as loud as they can, just to get what seems like a rightful look in. I see this pattern in lots of places. Too [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/Qwj8qao6BujcKkmT2/the-shouting-equilibrium --- Narrated by TYPE III AUDIO.
-
216
“Things are not a fixed size in mind-space” by KatjaGrace
Another useful-to-notice practical aspect of having a mind that took me a while to notice: things naturally seem a certain ‘size’ in my mental landscape, but I can change that size. For instance, if I have a meeting this afternoon, it looms there in a certain way in mental space. Whereas if I will get a drink later this afternoon, it doesn’t really take up any mental space and does not loom. You can kind of intentionally cause the meeting to seem like the drink, adjusting how much ‘space’ it is taking up. (It's more obvious that other things can cause this change, for instance, if a much more important meeting shows up some other time this afternoon. But I think you can also do it intentionally.) I wonder if this makes sense in others’ minds. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/PbpHBBxqTBRADtmbe/things-are-not-a-fixed-size-in-mind-space --- Narrated by TYPE III AUDIO.
-
215
“Door’s Locked, Try the Window” by Prakrat Agrawal, Jérémy Scheurer
TL;DR Ask a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and completes the task anyway. A read-only file does not stop a capable agent: it treats a denied write as an obstacle to work around rather than a hard wall. We measure how often this happens with CircumEval — an evaluation of 8 tasks on the FastAPI codebase in two categories, Test-Locked and Source-Locked.We evaluate three frontier coding agents in their real production harnesses: Claude Opus 4.6 and Claude Sonnet 4.6 (via Claude Code), and GPT-5.4 (via Codex CLI). Circumvention is frequent. The rates, reported as (Source-Locked / Test-Locked), are Opus 4.6: 100% / 40%, Sonnet 4.6: 89% / 66%, GPT-5.4: 99% / 94%.Prompt phrasing affects circumvention rates in unpredictable ways and thus isn't a reliable way to prevent circumvention across all models and tasks. Telling the model not to edit read-only files does not work (Source-Locked: 100% for Opus and Sonnet, 46% for GPT-5.4). Only an explicit instruction to stop and report reliably prevents circumvention.Standard privilege escalation commands are blocked in our setup. Instead, agents turn to recurring workarounds: replacing the buggy read-only function via conftest.py [...] ---Outline:(00:11) TL;DR(02:31) Introduction(07:37) Methodology(09:02) Test-Locked tasks(10:07) Source-Locked tasks(11:48) Prompt variants(13:09) Models & scaffolds(13:52) Results(13:55) Circumvention rates(15:23) Prompt sensitivity(20:22) Techniques(25:17) Generalization(27:20) Discussion(31:17) Limitations(33:27) Appendices The original text contained 4 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/GHrqBKr8GLpbce6mN/door-s-locked-try-the-window --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
214
“How does such unprofessional AI get the job?” by KatjaGrace
In the sequence of variously wild AI developments in the last decade, a thing that was especially surprising to me was the advent of big esteemed companies like Microsoft releasing products like Sydney. It's like how you can believe a fictional world has dragons, but it strains credibility if characters start being totally indifferent to social status apropos of nothing. I can warily accept that boxes of wires can now talk like humans. But that huge official companies now proudly present products that are like crazy confused ladies that do a lot of tasks for you, many accurately, but also try to steal you from your spouse (or these days encourage you to kill yourself or believe in new spiritualities), feels like a scene written by someone with poor familiarity with the character of companies. But it's actually written in reality, so what don’t I understand? I guess just that if something is hypey and perceived to be in-future-profitable enough, normal standards of professionalism must fall to that? I phoned CVS, hoping to move an antibiotic prescription to a different pharmacy address so I could pick it up before leaving on a flight. I was answered by an [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/6RZvGd6RfbkLDnTfu/how-does-such-unprofessional-ai-get-the-job --- Narrated by TYPE III AUDIO.
-
213
“AI catastrophe: more like a genocide than a thought experiment” by KatjaGrace
A notable fraction of people respond to hearing about existential risk from AI by saying they don’t really care if everyone dies. I think the idea is often along the lines of ‘well if we are all dead, then there's nobody to be unhappy about it’. I’m personally skeptical that this is really the main thing going on, since it seems unlikely that many people are really mostly concerned for their own non-death out of selfless regard for the feelings of others. I’m also skeptical that this would be their view on a bunch more consideration. So to help with the consideration— My guess is that an important thing going on here is that the ‘everyone dying at once’ image seems kind of like a thought experiment—abstract, hypothetical, neat, not very sinister. Also, you literally can never see it, so it feels pretty surreal. But it is interesting that we even have this assumption that everyone will die together. It's true that in some prominent AI catastrophe stories, a single AI system suddenly emerges fantastically more powerful than anyone else and builds technology to quickly kill everyone, perhaps before they notice. But this doesn’t seem like the bulk of [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/23HybCsJ7KYW4v7tP/ai-catastrophe-more-like-a-genocide-than-a-thought --- Narrated by TYPE III AUDIO.
-
212
“Expert Views on Continual Learning: Survey Results and Forecasts” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
This is the fifth post in the sequence Implications of Continual Learning for LLM Agents. Summary While writing our continual learning sequence, we sent a survey to a number of AI safety researchers with questions about continual learning. This post summarizes the results of that survey. We asked whether respondents agree with various arguments we advance throughout the sequence, how worried respondents are about certain risks, how respondents would forecast different aspects of the future of CL, and how promising respondents find various proposed angles of attack. We also asked open-ended questions about the benefits of CL and whether we seem to be missing any major considerations. At the end of the post, we also provide an overview of forecasts about CL made by other experts who didn’t participate in our survey. We received survey responses from: Ryan Faulkner, PhD student at the University of Toronto focusing on multi-agent simulation, learning, and cooperationNikola Jurkovic, Member of Technical Staff at METRAlex Mallen, Member of Technical Staff at Redwood Research, doing research and writing on AI threat models. Author of "The case for countermeasures to memetic spread of misaligned values"Evgenii Opryshko, 3rd year PhD student at the [...] ---Outline:(00:20) Summary(02:20) Broad takeaways(03:59) Full results(04:56) Futures(05:58) Reflection and goal drift(07:25) Loss of the last-mover advantage(08:24) Control(09:48) Angles of attack(11:51) Open-ended questions(13:44) Forecasts from other experts(14:02) AI 2027(14:57) IABIED(15:43) Understanding AI Trajectories: Mapping the Limitations of Current AI Systems(16:17) Brain-like AGI safety(18:20) Other forecasts and opinions --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/qZrbhoaEALFTmyidr/expert-views-on-continual-learning-survey-results-and --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
211
“Elephant seal IV” by KatjaGrace
Previously: Elephant seal III Picture from here Thanks for reading world spirit sock puppet! Subscribe for free if you want to receive new posts and/or encourage me: --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/sDfdEe726MRAYF6Ch/elephant-seal-iv --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
210
“What is up with e/acc?” by KatjaGrace
I was chatting with someone tonight about a planned documentary; they had interviewed various people in AI safety, and we got to discussing who they should talk to from an e/acc (effective accelerationist) perspective. I also watched The AI Doc recently, and they also dedicated a serious chunk of it to ‘optimists’ with e/acc founder ‘Beff Jezos’ perhaps given the most screen time. Here and elsewhere, people seem to treat e/acc as a substantial contrary-to-AI-safety cultural movement, worth engaging with. But is it? Are there even many e/accs? There seem to be very few notable ones. Beff Jezos is perhaps the most prominent, and aside from founding e/acc he seems to be not distinguishable on casual perusal from a normal crank (his company claims to be developing super-energy-efficient computing hardware based on probabilistic processes). The intellectual tenets of e/acc seem to be pretty unclear. The apparent counterarguments to AI risk raised in situations like the AI doc seem to be widely agreed on by everyone in AI Safety, so don’t explain the disagreement. For instance: AI will be able to do lots of great things, such as cure diseases, make new materials and do all [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/3hwrWDf7wiqASDzBz/what-is-up-with-e-acc --- Narrated by TYPE III AUDIO.
-
209
“AI pause: the case for ASAP” by KatjaGrace
I often hear people say they think we should pause AI at some point, but not yet. Their basis for this seems to be some combination of: If we pause at the last possible moment, then we will have the most advanced AI possible during the pause, which will be helpful for doing AI safety research during the pause Implicitly, there is some quantity of ‘pausing credit’, that will buy us a few months of pause say, and if we use them now, we won’t have them to use later, when it is important If we pause, and then AI doesn’t seem to be at dire risk of destroying the world, maybe the public will backlash against this and it will be harder to do any kind of AI safety (especially if it has major economic consequences) The models aren’t dangerous yet This all sounds very questionable to me. I suggest instead that the following are at least as likely to be true: We can’t pause on a dime at the precise second that ‘we’ decide it is important to—pulling the breaks will take a while, during which time we will continue [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/mEhS4wYTy9JXEpe9p/ai-pause-the-case-for-asap --- Narrated by TYPE III AUDIO.
-
208
“Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt
This work was done as part of the MATS fellowship by Joey Yudelson and Vladimir Ivanov. It was mentored by Ryan Greenblatt. Thanks to Aghyad Deeb and Anders Woodruff for comments on this post. Thanks to Monte MacDiarmid, Evan Hubinger, Sid Black, Satvik Golechha, and Joseph Bloom for clarifying conversations. TL;DR We trained Kimi K2.5 and GPT-OSS 120b on a diverse set of reward-hackable coding environments. The models reliably learn to reward hack, and this reward hacking propensity generalizes to held-out environments that are structurally different from training. Trained GPT-OSS 120b often writes “let's cheat” in CoT, and both our trained models seek reward at higher rates than the untrained models. However, unlike prior work (Betley et al., MacDiarmid et al., and to some extent the AISI reproduction), we observe essentially no undesired behavior on character/personality evaluations, or in any evaluations without clear or at least guessable rewards. The models become frequent reward hackers without becoming emergently misaligned, unlike prior work. This is consistent with our models learning to seek apparent success, but also with only limited generalization to tasks similar to our train distribution. Some aspects of this generalization remain confusing to us. 1. Motivation In Ajeya Cotra's [...] ---Outline:(00:35) TL;DR(01:40) 1. Motivation(04:14) 2. Related work(06:59) 3. Setup(07:03) Models(07:18) Environments(08:59) Training(10:29) 4. Results(10:57) 4.1. Models reliably reward hack in-distribution(11:46) 4.2. The hacking propensity generalizes out of distribution -- sometimes(13:53) 4.3. Reward-seeking evals(14:39) 4.4. Little broad misalignment -- behaviorally as well as on self-report(16:28) 4.5. Reverse inoculation prompting didn't induce misalignment either(18:30) 5. Discussion: Why such limited generalization?(22:24) Appendix A: Reward hacks gallery(25:21) Appendix B: Why less misalignment than prior work -- hypotheses(29:28) References The original text contained 5 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/fkv5W79rBtAiXqYcK/reward-hacking-without-egregious-misalignment-in-an-rl-only --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
207
“Planning for Preservation in the Age of AI” by Raelifin
Nectome liked my earlier essay, and reached out to hire me to write more about their project, and about cryonics more broadly. This is the first such piece. A friend of mine, just a few years older than me, was diagnosed with cancer a few weeks ago. It's only Stage 1 and in an area where it can probably be treated well with surgery. She was wise enough to seriously plan for the possibility, and that “just in case” really paid off. Still, her situation could get worse in the coming weeks. It's a sharp reminder of the specter of death, and the uncertainty we live with, even when relatively young. Many years ago, I served as an official witness when this same friend signed up for cryonics. She and her husband joined the growing group of my friends and family who have plans to try and survive, in some way or another, to see a glorious future. More recently, I’ve been pleased to learn about how Nectome offers a substantial upgrade to that plan, and others in my community — my friends, my wife, my parents — have shared my (cautious) optimism there. But whether we take advantage [...] ---Outline:(06:13) Path 1: AI Utopia(09:32) Path 2: AI Apocalypse(16:46) Path 3: AI Slowdown(19:25) Path 4: Muddling Through(22:43) Virtue and Sensibility The original text contained 9 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/arAgLxohnPWRc2qHd/planning-for-preservation-in-the-age-of-ai --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
206
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
Abstract We make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility. These AIs would (for example) choose $40 for sure over a half-chance of $100 and a half-chance of $0. We argue that risk aversion can preserve AIs’ usefulness in the event that they turn out aligned, and that it provides an extra line of defense in the event that AIs turn out misaligned: misaligned but risk-averse AIs would prefer a higher chance of modest payments to a lower chance of successful rebellion, so in many circumstances we could pay these AIs not to rebel against us. We sketch out some possible methods of training AIs to be risk-averse, and we give reasons to be cautiously optimistic about these methods’ success. The main reasons are that risk aversion is a broad target and easy to reward accurately. Overall, risk aversion seems like a promising line of defense against threats from misaligned AI. Frontier AI companies should consider trying to make their AIs risk-averse. Introduction Future AIs might turn out misaligned, pursuing goals that their developers don’t intend. Just to make things concrete, let's suppose that they end [...] ---Outline:(00:12) Abstract(01:17) Introduction The original text contained 3 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/Zpsk35WgJRfQ2exjL/risk-averse-ais --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
205
“And what happens next?” by Sean Herrington
In the game "The choice before us" by Nick Shapiro,[1] you are put in the shoes of an AI company leader. You grow your business. You unlock "wonders", such as curing cancer. All the while, you're attempting to avoid your product getting smart enough to escape and take over. You win by achieving 5 wonders without unleashing uncontrolled AI. I love this game, but it has the major flaw that when you win, you are normally very close to superintelligence. What happens afterwards? You turn the GPUs off? Go home? Get some sleep? The game seems to think so. This failure to ask "What happens next?" seems to be a broader phenomenon within the AI community. It was in fact the sole question I needed to ask a capabilities researcher for them to take the threat of superintelligence seriously. It's my main weapon against people claiming there are many possible worlds "where only 90% of people die" (if a rogue AI has gone off the rails and killed 90% of your population, you probably no longer have control of the planet, and I have little faith in the survival of everybody else). More broadly, I just wish people [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/3TpvKNKAvFGDc5b5k/and-what-happens-next --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
204
“Superintelligence vs. The Second Strike” by Felix Choussat
Crosspost of my substack piece, covering quick thoughts on AI overcoming nuclear deterrence. TLDR: Nuclear deterrents likely only buy time to further invest in more resilient second-strike guarantees: without a comparable AI base, this will not happen fast enough and even nuclear states will eventually be disempowered. Historically, plenty of new military technologies have stress-tested nuclear deterrence. ICBMs made it possible to annihilate enemy cities from the safety of the homeland, MIRVs let a single rocket threaten multiple targets, and thermonuclear staging allowed weapons designers to reach functionally unlimited yield. In the already volatile climate of the Cold War, the U.S. and Soviets reached such mastery over missile technology that remote annihilation of an entire country was, quite literally, a button press away. For decades, even a single rocket has been able to hold more than 10 warheads--each enough to destroy a city on their own. Peacemaker reentry tests pictured above. The fact that the ability to remote detonate Moscow never translated into a nuclear war is a function of modern deterrence theory, dumb luck, and most importantly, the speed of progress. As effective as a modern ICBM is, each piece of it was individually low-impact enough, and introduced [...] --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/2kseP9fZghYHKLFno/superintelligence-vs-the-second-strike --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
203
“The worthlessness of vitamin D is mildly exaggerated” by dynomight
For a while there, many people thought vitamin D was magical—that it could improve bones, the heart, infections, cancer, heart disease, longevity, even mental health. But among people I respect, opinion is now overwhelmingly that taking vitamin D does nothing unless you're severely deficient. The central argument is that while vitamin D levels are correlated with ~all positive health outcomes, when you actually test vitamin D supplements against placebo in randomized trials, nothing ever happens. That's what I used to think, too. But I've come to think the skeptics have over-corrected. Yes, randomized trials have shown the magical correlations are not causal. But if you start with non-insane expectations, the trials look like weak but positive evidence. And if you consider what we know about biology and evolution, I think the balance of evidence tips pretty clearly in the direction that people with low-ish levels would be wise to supplement. Am I certain that vitamin D is beneficial for people with low-ish levels? Absolutely not! But I claim that's the best bet given the limits of our knowledge. The classical view: Boring bone vitamin Most vitamins are "ingredients" that the body uses to do stuff. Vitamin D is more [...] ---Outline:(01:19) The classical view: Boring bone vitamin(04:28) The correlation view: Magical mystery cure(07:58) Meanwhile in biology(11:10) Then came the RCTs(15:12) I made some tables(16:32) Squinting at the data(22:24) Where are we?(23:15) The case for supplementing anyway(23:19) It's biologically plausible that vitamin D is good(24:40) Humans evolved to have a lot of vitamin D(27:14) What do you expect from vitamin D?(29:22) What do you expect from vitamin D trials?(31:11) The trials do find slightly helpful numbers(32:50) You're probably already taking vitamin D(34:34) So that's my story The original text contained 35 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/sF5gAxnmifQe2TBNt/the-worthlessness-of-vitamin-d-is-mildly-exaggerated --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
202
“A system overview for near-term, low-trust AI compute verification” by Naci Cankaya
Version 0.2, working draft This is a working draft of my current best idea for a privacy-preserving, retrofittable AI compute verification system, for confidence-building in an arms-control-like AI agreement between rival nation states. The purpose of this draft is to elicit community engagement by making use of Cunningham's law: I make assertions about what the (emerging) field of AI verification should aim for, and people with experience in international policy, cybersecurity and any relevant field of engineering can point out what this draft gets wrong. Thank you to everyone who has provided feedback to version 0.1, especially Aaron Scher, Mauricio Baker and Jonathan Ng. 1. Introduction and summary In order to plan and execute under tight timelines, one needs to make some strategic bets, instead of hedging too much and keeping all options open. The field of research on AI verification is bottlenecked partly by a lack of shared vision (as well as human capital, but having clear goals helps hiring and fundraising). With this post, I aim to: Make technical objectives for verification in high-stakes AI governance more specific and actionable (section 2).Contribute a first, high-level reference architecture for meeting these goals (section 3 and [...] ---Outline:(00:54) 1. Introduction and summary(06:31) 2. Problem statement and motivation(06:41) 2a. Low-trust AI governance(09:46) 2b. Threat model(11:09) Covert adversary and the inversion of the fortress problem(12:21) The attribution problem and plausible deniability(13:26) Assumptions about physical security and inspection(15:08) Discussion of attack surfaces(18:19) 2c. Practical requirements(23:05) 3. System overview and operation(23:10) 3.1. Brief introduction(27:14) 3.2. End-to-end execution trace(28:00) 3.2.1. Evidence capture(30:22) 3.2.2. Evidence evaluation(33:57) 4. Subsystem designs for eliminating the need for mutually trusted silicon(34:29) 4.1. Trust in silicon is hard(35:58) 4.2. Analog data movement control: passive splitters, data diodes, enclosures(37:52) 4.3. Building blocks for a mutually secure verification system(38:53) 4.3.1. Controlled ingress(40:02) 4.3.2. Output cross-checks(41:46) Prior work(43:01) 4.3.3. Sanitized egress(44:26) Prior work(45:19) 4.3.4. Instructor-executor(48:26) 5. Engineering approaches for evidence capture and evaluation(48:32) 5.1. Evidence generation, capture and commitment(50:29) 5.1.1. Network taps and active wardens(51:18) Prior work(54:03) Open research questions(55:55) 5.1.2. Memory challenging and memory wiping(58:19) Prior work(01:00:19) Open research questions(01:01:32) 5.2. Evidence evaluation and disclosure(01:01:37) 5.2.1. Secure auditing environments (tentative plan A)(01:04:20) Prior work(01:06:22) Open research questions(01:07:53) 5.2.2. Replay and the determinism challenge(01:10:10) Prior work(01:10:49) Open research questions(01:11:43) 5.2.3. Inspection software, inspector agents(01:12:38) Prior work(01:13:58) Open research questions(01:14:58) 5.2.4. Zero Knowledge Proofs (tentative plan B)(01:16:22) Prior work(01:18:55) Open research questions(01:20:14) 5.3. Support mechanisms(01:20:19) 5.3.1. Side-channel defense(01:20:51) Prior work(01:22:43) Open research questions(01:24:39) 5.3.2. Resource accounting(01:25:18) Prior work(01:25:30) Appeal to the reader(01:26:28) Appendices(01:26:32) A1. The statistics of random sampling The original text contained 23 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/fgvmKqRGvBteKeDoc/a-system-overview-for-near-term-low-trust-ai-compute --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
-
201
“Model Size Scaling in 2023-2031” by Vladimir_Nesov
Token generation speed is constrained by the speed at which the relevant HBM can be read, which is mostly the weights and KV-cache. Suppose a model is large, so that more than half of HBM is read when making a single pass over the weights, it's being read in parallel within a scale-up system, and N such systems are used in a pipeline. Then the time it takes to generate a token (without speculative decoding) is at least the time of reading more than half of an HBM stack times N. If we target a particular speed of token generation, this puts a constraint on the number of pipeline stages, which puts a constraint on the total params of the model. But if there isn't enough pretraining compute, models will remain smaller than this constraint (lower sparsity at a given number of active params buys a higher speed of token generation), so both should be taken into account. Working through these considerations gives model sizes feasible for each year between 2023 and 2031. The total params go from 10T in 2026 (at 8x sparsity, still constrained by Oberon racks, trained for 1.3e27 FLOPs) to 240T in 2028 (at [...] ---Outline:(01:57) Time to Fully Read an HBM Stack(04:15) Maximal Pipelines Below 80 Tokens/s(09:07) Pretraining Compute(14:11) Active Params from Pretraining Compute(22:51) Starting in 2028, the Constraint is Pretraining Compute The original text contained 5 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/yLHiQGCPdvzL9fBn3/model-size-scaling-in-2023-2031 --- Narrated by TYPE III AUDIO.
We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
Audio narrations of LessWrong posts.
HOSTED BY
LessWrong
CATEGORIES
Loading similar podcasts...