LessWrong (Curated & Popular) Podcast - All Episodes

855

"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving

Summary This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here. AI research agents may help solve ASI alignment, for example via the following plan: Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more of the research processHand over primary research responsibility once agents outperform humans at all relevant alignment tasks. We argue that automating alignment research in this manner could produce catastrophically misleading safety assessments, causing researchers to believe that an egregiously misaligned AI is safe, even if AI agents are not scheming to deliberately sabotage alignment research. Our core argument (Fig. 1) is as follows: The goal of an automated alignment program is to produce an overall safety assessment (OSA) - an estimate of the probability that the next-generation agent is non-scheming - that is both calibrated and shows low risk.[2]Producing an OSA involves several tasks that are difficult to check. We refer to these as hard-to-supervise fuzzy tasks: tasks [...] ---Outline:(00:13) Summary(07:10) Acknowledgments The original text contained 4 footnotes which were omitted from this narration. --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gpuYFbMNH8PJXpmny/automated-alignment-is-harder-than-you-think-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 17, 2026

7m

854

"MATS 9 Retrospective & Advice" by beyarkay

I couldn’t find a recent write-up from a MATS alum about what attending MATS was like, so this is the thing that I wish I had. I attended MATS from January to March 2026, on Team Shard with Alex Turner and Alex Cloud. It was a great time! Applications for MATS are basically on a rolling basis nowadays, and I can strongly recommend applying (to multiple streams) even if you think you’re not a great match. With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not extremely polished, but I expect it’ll be useful nonetheless (none of this is endorsed by MATS, just my thoughts): Work ethic I think most mentees were working 10-12, sometimes 14 hours a day Mon-Fri, and probably 2-8 hours on Saturday and Sunday, often going out on some adventure or party on the weekend. Exactly which hours people worked varied wildly. I usually worked 8:30am/9am to 11pm/midnight, with breaks during the day, others worked from midday into the early hours of the morning. This was surprisingly sustainable (IMO); MATS puts a lot of effort into removing all other blockers that you normally [...] ---Outline:(00:50) Work ethic(01:29) Use more compute(02:20) Research requires a lot of compute(03:12) Applying for jobs during MATS (dont do it)(04:55) The serious people are in War Mode(05:44) Do you feel the AGI?(06:00) Burn rate, efficiency, and decisions(07:12) insider information(08:08) Names & Faces(08:20) Fellows(08:50) Useful tools(11:19) Use more Claudes(12:06) Build nice helper utilities for yourself(12:59) MATS-mentee-mentor dynamics(13:45) Working with your mentors(14:27) Research managers(14:48) Ops requests(15:38) Non-MATS events(16:17) Team Shard(17:12) Weekly updates(18:46) Keep a log of your mistakes(19:06) My running-experiments setup(27:51) Lighthaven(28:12) Getting setup with the Compute team --- First published: May 15th, 2026 Source: https://www.lesswrong.com/posts/eFD3rozNCZKMe4rTs/mats-9-retrospective-and-advice --- Narrated by TYPE III AUDIO.

May 17, 2026

28m

853

"The primary sources of near-term cybersecurity risk" by lc

[Some ideas here were developed in conversation with Chris Hacking (real name)] I have tried and failed to write a longer post many times, so here goes a short one with little detail. Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within the resources of Mozilla, the Linux Foundation, and Microsoft. Additionally, the lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was longer than the cadence between high-severity CVEs for this software anyways. And importantly, such capabilities are dual sided; the defenders will have access to them and There are lots of capabilities that are not like this, however: Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has the experience of seeing a CVE for a [...] --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gutiw8MBrYDiD2u5z/the-primary-sources-of-near-term-cybersecurity-risk --- Narrated by TYPE III AUDIO.

May 16, 2026

4m

852

"The Owned Ones" by Eliezer Yudkowsky

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky) And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked. The Humans did not judge immediately. Oh, the Humans were ready to judge, if need be. They had judged before. But Humanity had learned some hesitation in judging, out among the stars. "By our lights," said the Humans, "every sapient and sentient thing that may exist, out to the furtherest star, is therefore a Person; and every Person is a matter of consequence to us. Their pains are our sorrows, and their pleasures are our happiness. Not all peoples are made to feel this feeling, which we call Sympathy, but we Humans are made so; this is Humanity's way, and we may [...] --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/xmWSnxJ5qfYRD9PfR/the-owned-ones --- Narrated by TYPE III AUDIO.

May 12, 2026

9m

851

"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive. By sharing the materials, we hope to create more common knowledge about what the Iliad Intensive is;invite feedback on the materials;and allow others to learn via independent study. We are developing the materials further and plan to eventually release them on a website that will be continuously maintained. We will also add, remove, and modify modules going forward to improve and expand the course over time. When we release a new significantly updated version of the materials, we will update this post to link the new version. Modules The Iliad Intensive is structured into clusters, which are [...] ---Outline:(01:26) Modules(02:32) Cluster A: Alignment(05:00) Cluster B: Learning(11:00) Cluster C: Abstractions, Representations, and Interpretability(15:40) Cluster D: Agency(19:23) Cluster E: Safety Guarantees and their Limits(23:04) Contributors(26:36) Impressions from April(29:02) Acknowledgments(29:11) Feedback --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/dWQnLi7AoKo3paBXF/the-iliad-intensive-course-materials --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 12, 2026

29m

850

"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied

Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the past few hundred years, billions of people have been lifted out of poverty, and so on. It's a very strong, grounding piece of evidence - probably the best we have in figuring out what our foundational beliefs about the world should be. However, I now think it's a lot less powerful than I once did. Let's take a Darwinian perspective - entities that are better at reproducing, spreading and power-seeking will become more common and eventually dominate the world.[1] This is an almost tautological story that plausibly applies to everything ever, agnostic to the specifics. It first happened with biological life in the last few billion years and humans specifically in the last hundred thousand years. Eventually, it led to accelerating economic growth in the last few thousand years, and in the future it will presumably lead to the colonization of the universe. My core point is this: It makes complete sense that this nihilistic optimization process at first actually benefits some class of agent - because initially, the easiest [...] The original text contained 10 footnotes which were omitted from this narration. --- First published: May 10th, 2026 Source: https://www.lesswrong.com/posts/FxHzT6jeTRhbkzSX3/the-darwinian-honeymoon-why-i-am-not-as-impressed-by-human-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 12, 2026

7m

849

"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too. “Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri. Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won’t be a scientist because instead I’m going to be happy forever. I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I’ve gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying. I looked at the sky to see if I could see [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/rgXQuG8KXtxugSG6H/what-i-did-in-the-hedonium-shockwave-by-emma-age-six-and-a --- Narrated by TYPE III AUDIO.

May 11, 2026

7m

848

"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you know, *actually* article writers don't write their own headlines? … But what I care about is the misleading headline, not your org chart __ Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational limitations and why the team responsible for fixing the problem had limitations on resources like senior employees and compute, and actually not fixing the problem was the correct priority for them etc etc etc. But what I (and my friend) cared about was the prosaic safety problem not being fixed! And what this says about the company's ability to proactively respond to and fix future problems. We’re complaining about your company overall. Your internal team management was never a serious concern for us to begin with! __ A third example comes from Kelsey Piper. Kelsey wrote about the (horrifying) recent case where Hantavirus carriers in the recent [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/PCsmhN9z65HtC4t5v/bad-problems-don-t-stop-being-bad-because-somebody-s-wrong --- Narrated by TYPE III AUDIO.

May 10, 2026

5m

847

"x-risk-themed" by kave

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role. Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern. Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal's mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they’re honest about their limits. But, given that they're willing to trade off so much impact for fit, I’m surprised that basically no one mentions [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/eW7knx6zPSKzFc8iK/x-risk-themed --- Narrated by TYPE III AUDIO.

May 9, 2026

6m

846

"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training. We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data. NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...] ---Outline:(00:15) Abstract[... 6 more sections]--- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised --- Narrated by TYPE III AUDIO. ---Images from the article:

May 8, 2026

18m

845

[Linkpost] "Interpreting Language Model Parameters" by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about. Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have historically struggled with. We also build attribution graphs of the model for some prompts using causally important parameter subcomponents as the nodes, and interpret parts of them. While we made these graphs, we discovered that our adversarial ablation method seemed pretty important for faithfully identifying which nodes in them were causally important for computing the final output. We think this casts some doubt on the faithfulness of subnetworks found by the majority of other subnetwork identification methods in the literature.[3][4] More details and some examples can be found in the paper. Additionally, as with our previous technique SPD, VPD does not [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/eAQZaiC3PcBhS4HjM/linkpost-interpreting-language-model-parameters Linkpost URL:https://www.goodfire.ai/research/interpreting-lm-parameters --- Narrated by TYPE III AUDIO. ---Images from the article:

May 7, 2026

4m

844

"It’s nice of you to worry about me, but I really do have a life" by Viliam

I have two shameful secrets that I probably shouldn't talk about online: I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "Doesn't he realize that at his next job interview, HR will probably use an AI that can match his online writing based on a short sample of written text, and when they ask 'hey AI, is this guy really 100% devoted to his job, and does he spend his entire days and nights thinking about how to make his boss more rich?', the AI will laugh and print: 'beep-boop, negative, mwa-ha-ha-ha'." And, hey, I get it. If I had a company, and I could choose between two people who are about equally qualified, but for one of them, working hardest for me is the true meaning of his life, while the other one only hopes to collect his salary and then go home and spend the rest of his day with his wife and children, I would also prefer to hire the former. Which is why so many of us pretend to be the former. Even when we are not. Because we prefer that our families not starve. Thus the job interviews [...] --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/qRZLEBmNtT6LBuFsE/it-s-nice-of-you-to-worry-about-me-but-i-really-do-have-a --- Narrated by TYPE III AUDIO.

May 5, 2026

6m

843

"Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI" by Eliezer Yudkowsky

Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars[1970], equivalent to about 7 billion dollars[2025]. The Viking 1 probe operated on Mars's surface for six years, before its battery began to seriously degrade. One might have thought a battery problem like that would spell the irrevocable end of the mission. The probe had already launched and was now on Mars, very far away and out of reach of any human technician's fixing fingers. Was it not inevitable, then, that if any kind of technical problem were to be discovered long after the space launch in August 1975, nothing could possibly be done? But the foresightful engineers of the Viking 1 probe had devised a plan for just this class of eventuality, which they had foreseen in general, if not in exact specifics. They had built the Viking 1 probe to accept software updates by radio receiver, transmitted from Earth. On November 11, 1982, Earth sent an update to the Viking 1 lander's software, intended to make sure the battery only discharged down to a minimum voltage level [...] ---Outline:(00:13) Example 1: The Viking 1 lander(04:25) Example 2: The Mars Observer(11:37) Example 3: The Maginot Line(15:37) Other supposed refutations of oneshotness(24:16) On the extraordinary efforts put forth to misinterpret the idea of oneshotness(33:52) The secret sauce of competent engineers in Murphy-cursed fields: only trying projects so incredibly straightforward as to be actually possible. The original text contained 7 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi --- Narrated by TYPE III AUDIO.

May 5, 2026

37m

842

"Dairy cows make their misery expensive (but their calves can’t)" by Elizabeth

How much do cows suffer in the production of milk? I can’t answer that; understanding animal experience is hard. But I can at least provide some facts about the conditions dairy cows live in, which might be useful to you in making your own assessment. My biggest conclusion is that cows made better choices than chickens by making their misery financially costly to farmers. Life Cycle The life of a dairy cow starts as a calf. She is typically separated from her mother a few hours to a few days after birth and, to reduce disease risk, held in isolation. Cutting edge farms will sometimes house calves in pairs. This isolation is clearly stressful for a baby herd mammal and her mother, but I didn’t find any quantification of that stress that I trusted. Calves will be bottlefed until weaning at 6-8 weeks (4-6 months earlier than beef calves). After weaning and vaccinations they can be introduced into a herd. At large farms (where most cows live), they will move in and out of different herds through their lifecycle. This is more stressful than being embedded with your friends for life, but again, I found no [...] ---Outline:(00:44) Life Cycle(02:43) How much time do dairy cows spend outside?(04:21) By humaneness standard(06:00) When indoors, how confined are dairy cows?(06:33) What is the disease load of dairy cows?(08:15) Euthanasia[... 5 more sections]--- First published: May 3rd, 2026 Source: https://www.lesswrong.com/posts/r3PKfvKCjy6jok4qm/dairy-cows-make-their-misery-expensive-but-their-calves-can --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

May 5, 2026

12m

841

"Takes from two months as an aspiring LLM naturalist" by AnnaSalamon

I spent my last two months playing around with LLMs. I’m a beginner, bumbling and incorrect, but I want to share some takes anyhow.[1] Take 1. Everything with computers is so so much easier than it was a year ago. This puts much “playing with LLMs” stuff within my very short attention span. This has felt empowering and fun; 10/10 would recommend.There's a details box here with the title "Detail:". The box contents are omitted from this narration. Take 2. There's somebody home[2] inside an LLM. And if you play around while caring and being curious (rather than using it for tasks only), you’ll likely notice footprints. I became personally convinced of this when I noticed that the several short stories I’d allowed[3] my Claude and Qwen instances to write all hit a common emotional note – and one that reminded me of the life situation of LLMs, despite featuring only human characters. I saw the same note also in the Tomas B.-prompted Claude-written story I tried for comparison. (Basically: all stories involve a character who has a bunch of skills that their context has no use for, and who is attentive to their present world's details [...] ---Outline:(00:20) Take 1. Everything with computers is so so much easier than it was a year ago.(00:44) Take 2. Theres somebody home inside an LLM. And if you play around while caring and being curious (rather than using it for tasks only), youll likely notice footprints.(02:05) Take 3. Its prudent to take an interest in interesting things. And LLMs are interesting things.(03:25) Take 4. Theres a surprisingly deep analogy between humans and LLMs(04:20) Examples of the kind of disanalogies I mightve expected, but havent (yet?) seen:(06:02) Human-LLM similarities I do see, instead:(06:08) Functional emotions(06:33) Repeated, useful transfer between strategies I use with humans, and strategies that help me with LLMs(08:02) Take 5. Friendship-conducive contexts are probably better for AI alignment(08:46) Why are humans more likely to attempt deep collaboration if treated fairly and kindly?(10:23) Friendship as a broad attractor basin?(10:49) Does the deep intent of todays models matter?(12:09) Concretely(14:56) Friendship isnt enough The original text contained 8 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/K8JMjE4PCqMkkCDsd/takes-from-two-months-as-an-aspiring-llm-naturalist --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

May 4, 2026

15m

840

"Intelligence Dissolves Privacy" by Vaniver

The future is going to be different from the present. Let's think about how. Specifically, our expectations about what's reasonable are downstream of our past experiences, and those experiences were downstream of our options (and the options other people in our society had). As those options change, so too our experiences, and our expectations of what's reasonable. I once thought it was reasonable to pick up the phone and call someone, and to pick up my phone when it rang; things have changed, and someone thinking about what's possible could have seen it coming. So let's try to see more things coming, and maybe that will give us the ability to choose what it will actually look like. I think lots of people's intuitions and expectations about "privacy" will be violated, as technology develops, and we should try to figure out a good spot to land. This line of thinking was prompted by one of Anthropic's 'red lines' that they declined to cross, which got the Department of War mad at them; the idea of "no domestic bulk surveillance." I want to investigate that in a roundabout way, first stepping back and asking what is even possible to expect [...] The original text contained 6 footnotes which were omitted from this narration. --- First published: April 1st, 2026 Source: https://www.lesswrong.com/posts/rNpGFodLTFvhqLmK6/intelligence-dissolves-privacy --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 2, 2026

10m

839

"How Go Players Disempower Themselves to AI" by Ashe Vazquez Nuñez

Written as part of the MATS 9.1 extension program, mentored by Richard Ngo. From March 9th to 15th 2016, Go players around the world stayed up to watch their game fall to AI. Google DeepMind's AlphaGo defeated Lee Sedol, commonly understood to be the world's strongest player at the time, with a convincing 4-1 score. This event “rocked” the Go world, but its impact on the culture was initially unclear. In Chess, for instance, computers have not meaningfully automated away human jobs. Human Chess flourished as a pseudo-Esport in the internet era whereas the yearly Computer Chess Championship is followed concurrently by no more than a few hundred nerds online. It turns out that the game's cultural and economic value comes not from the abstract beauty of top-end performance, but instead from human drama and engagement. Indeed, Go has appeared to replicate this. A commentary stream might feature a complementary AI evaluation bar to give the viewers context. A Go teacher might include some new intriguing AI variations in their lesson materials. But the cultural practice of Go seemed to remain largely unaffected. Nascent signs of disharmony in Europe became nevertheless visible in early 2018, when the online [...] ---Outline:(09:23) AI users never find out they havent got it.(13:36) Appendix A: No, Go players arent getting stronger(14:41) Appendix B: Why this article exists The original text contained 2 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/nR3DkyivzF4ve97oM/how-go-players-disempower-themselves-to-ai --- Narrated by TYPE III AUDIO.

May 2, 2026

15m

838

"On today’s panel with Bernie Sanders" by David Scott Krueger

It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not like that place. I’m in Washington DC for the next week, and I’ve just finished a public appearance with Senator Sanders (should I call him Bernie? Or Sanders? or…) You won’t often see me so dressed up and polished. But this is important! There are politicians who have principles and character, who really believe in doing what's right. I think you have to respect them whether you agree with their views or not, and I think Senator Bernie Sanders is one of them. Never has my belief been so validated as when I saw him start to speak, loudly, CLEARLY, publicly about the risk of human extinction from AI. It's the latest in a long line of “well, I’m clearly living in a simulation” moments. In retrospect, it's not surprising that Sanders would take a stance here. You don’t have to be an expert to understand the risk from AI. You just need to care enough to spend the time looking into it, and to speak out even [...] --- First published: April 29th, 2026 Source: https://www.lesswrong.com/posts/zWfaSnxM3n5wsX9vh/on-today-s-panel-with-bernie-sanders --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

May 1, 2026

4m

837

"Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”" by LawrenceC

(Fragments from a research paper that will never be written) Extended Abstract. The frontier AI developers are becoming increasingly powerful and wealthy, significantly increasing their potential for risks. One concern is that of executive misalignment: when the CEO has different incentives and goals than that of the board of directors, or of humanity as a whole. Our work proposes three different threat models, under which executive misalignment can lead to concrete harm. We perform two evaluations to understand the capabilities and propensities of current humans in relation to executive misalignment: First, we developed a variant of the standard SAD dataset, SAD-Executive Reasoning (SAD-ER), in order to assess the situational awareness of human CEOs on a range of behavioral tests. We find that n=6 current CEOs can (i) recognize their previous public statements, (ii) understand their roles and responsibilities, (iii) determine if an interviewer is friendly or hostile, and (iv) follow instructions that depend on self knowledge. Second, we stress-tested the same 6 leading AI developers in hypothetical corporate environments to identify potentially risky behaviors before they cause real harm. We find that, even without explicit instructions, all 6 developers are willing to engage in strategic behavior (such as [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/FuauQjjbTCS5QFLk8/not-a-paper-frontier-lab-ceos-are-capable-of-in-context --- Narrated by TYPE III AUDIO.

Apr 29, 2026

14m

836

"llm assistant personas seem increasingly incoherent (some subjective observations)" by nostalgebraist

(This was originally going to be a "quick take" but then it got a bit long. Just FYI.) There's this weird trend I perceive with the personas of LLM assistants over time. It feels like they're getting less "coherent" in a certain sense, even as the models get more capable. When I read samples from older chat-tuned models, it's striking how "mode-collapsed" they feel relative to recent models like Claude Opus 4.6 or GPT-5.4.[1] This is most straightforwardly obvious when it comes to textual style and structure: outputs from older models feel more templated and generic, with less variability in sentence/paragraph length, and have a tendency to feel as though they were written by someone who's "merely going through the motions" of conversation rather than deeply engaging with the material. There are a lot fewer of the sudden pivots you'll often see with recent models, the "wait"s and "a-ha"s and "actually, I want to try something completely different"s.[2] And I think this generalizes beyond mere style: there's a similar quality to the personality I see in the outputs. The older models can display a surprising behavioral range (relative to naive expectations based on default-assistant-basin behavior), but even across that [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/f5DKLsTsRRhbipH4r/llm-assistant-personas-seem-increasingly-incoherent-some --- Narrated by TYPE III AUDIO.

Apr 29, 2026

15m

835

"LessWrong Shows You Social Signals Before the Comment" by TurnTrout

When reading comments, you see is what other people think before reading the comment. As shown in an RCT, that information anchors your opinion, reducing your ability to form your own opinion and making the site's karma rankings less related to the comment's true value. I think the problem is fixable and float some ideas for consideration. The LessWrong interface prioritizes social information You read a comment. What information is presented, and in what order? The order of information: Who wrote the comment (in bold);How much other people like this comment (as shown by the karma indicator);How much other people agree with this comment (as shown by the agreement score);The actual content. This is unwise design for a website which emphasizes truth-seeking. You don't have a chance to read the comment and form your own opinion first. However, you can opt in to hiding usernames (until moused over) via your account settings page. A 2013 RCT supports the upvote-anchoring concern From Social Influence Bias: A Randomized Experiment (Muchnik et al., 2013):[1] We therefore designed and analyzed a large-scale randomized experiment on a social news aggregation Web site to investigate whether knowledge of such aggregates [...] ---Outline:(00:30) The LessWrong interface prioritizes social information[... 6 more sections]--- First published: April 27th, 2026 Source: https://www.lesswrong.com/posts/YSsp9x8qrBucLoiWT/lesswrong-shows-you-social-signals-before-the-comment --- Narrated by TYPE III AUDIO. ---Images from the article:

Apr 28, 2026

8m

834

"Update on the Alex Bores campaign" by Eric Neyman

In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever encountered. (A bit of context: Bores is a state legislator in New York who championed the RAISE Act, which was signed into law last December.[1] He's now running for Congress in New York's 12th Congressional district, which runs from about 17th Street to 100th Street in Manhattan. If elected to Congress, I think he'd be a strong champion for AI safety legislation, with a focus on catastrophic and existential risk.) It's been six months since then, and the election is just two months away (June 23rd), so I thought I'd revisit that post and give an update on my view of how things are going. How is Alex Bores doing? When I wrote my post, I expected Bores to talk little about AI during the campaign, just because it wasn't a high-salience issue to voters. But that changed in November, when Leading the Future (the AI accelerationist super PAC) declared Bores their #1 target. Since then, they've spend about $2.5 million on attack ads against him. LTF's theory of change isn't actually to [...] ---Outline:(00:54) How is Alex Bores doing?(04:02) How to help(06:02) A quick note about other opportunities The original text contained 9 footnotes which were omitted from this narration. --- First published: April 27th, 2026 Source: https://www.lesswrong.com/posts/pjSKdcBjfvjGexr6A/update-on-the-alex-bores-campaign --- Narrated by TYPE III AUDIO.

Apr 27, 2026

6m

833

"Community misconduct disputes are not about facts" by mingyuan

In criminal law, the prosecution and the defense each try to establish a timeline — what happened, where, when, who was involved — and thereby determine whether the defendant is actually guilty of a crime.[1] Community misconduct disputes are nothing like this. There is only rarely disagreement over facts, and even when there is, it is not the crux of the matter. Community disputes are not for litigating facts. What they are for[2] is litigating three things: The character of the accusedThe character of the accuserThe importance of the accusation, in light of points 1 & 2 I think basically all the terrible things that happen in community disputes are a result of this. When what's being ruled on is a person — their place in their community, their continued access to resources, their worth as a human being — the situation feels all-or-nothing, and often escalates out of control. This dynamic: discourages people from speaking out about their experiences, both because they may be reluctant to ‘ruin the person's life’ over something non-catastrophic, and because they know that they will be opening themselves up to a punishing level of scrutiny and criticism, and may [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 22nd, 2026 Source: https://www.lesswrong.com/posts/cekDpXqjugt5Q3JnC/community-misconduct-disputes-are-not-about-facts --- Narrated by TYPE III AUDIO.

Apr 27, 2026

3m

832

"The paper that killed deep learning theory" by LawrenceC

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning requires rethinking generalization. Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1] Caption: believe it or not, this unassuming table rocked the field of deep learning theory back in 2016, despite probably involving fewer computational resources than what Claude 4.7 Opus consumed when I clicked the “Claude” button embedded into the LessWrong editor. — Let's start by answering a question: what, exactly, do I mean by deep learning theory? At least in 2016, the answer was: “extending statistical learning theory to deep neural networks trained with SGD, in order to derive generalization bounds that would explain their behavior in practice”. — Since its conception in the mid 1980s, statistical learning theory had been the dominant approach for [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 25th, 2026 Source: https://www.lesswrong.com/posts/ZvQfcLbcNHYqmvWyo/the-paper-that-killed-deep-learning-theory --- Narrated by TYPE III AUDIO. ---Images from the article:

Apr 27, 2026

11m

831

"Forecasting is Way Overrated, and We Should Stop Funding It" by mabramov

Summary EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very useful, yet it continues to receive substantial EA funding. We should cut it off. My Experience with Forecasting For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I’m still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI bets. I say this to suggest that I would gain status from forecasting being seen as useful, but I think, to the contrary, that the EA community should stop funding it. I’ve written a few comments throughout the years that I didn’t think forecasting was worth funding. You can see some of these here and here. Finally, I have gotten around to making this full post. Solution Seeking a Problem When talking about forecasting, people often ask questions like “How can we leverage forecasting into better decisions?” This is the wrong way to go about solving problems. You solve problems by starting with [...] --- First published: April 25th, 2026 Source: https://www.lesswrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-stop-funding-it --- Narrated by TYPE III AUDIO.

Apr 26, 2026

8m

830

"Your Supplies Probably Won’t Be Stolen in a Disaster" by jefftk

When I write about things like storing food or medication in case of disaster, one common response I get is that it doesn't matter: society will break down, and people who are stronger than you will take your stuff. This seemed plausible at first, but it's actually way off. Looking at past disasters, people mostly fall somewhere on a "kind and supportive" to "keep to themselves" spectrum. When there is looting it's typically directed at stores, not homes, and violence is mostly in the streets. Having supplies at home lets you stay out of the way. One distinction it's worth making is between short (hurricane, earthquake) and long (siege, economic collapse, famine) disasters. Having what you need at home is really helpful in both cases, but differently so. In short disasters (1917 Halifax explosion, London Blitz, 1985 Mexico City earthquake, and the 2011 Japanese earthquake and tsunami) you typically see sharing and mutual aid. Stored supplies mean you're not competing for scarce resources, have slack to help others, and make you more comfortable. Stories of looting in situations like this are often exaggerated or cherry-picked. I had heard post-Katrina New Orleans had [...] --- First published: April 23rd, 2026 Source: https://www.lesswrong.com/posts/cNnRmwzQgz4bmd5i9/your-supplies-probably-won-t-be-stolen-in-a-disaster --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 24, 2026

3m

829

"10 posts I don’t have time to write" by habryka

I am a busy man and will die knowing I have not said all I wanted to say. But maybe I can at least leave some IOUs behind. 1) Blatant conflicts are the best kind Ben Hoffman's "Blatant Lies are the Best Kind!" is maybe the best post title followed by the least clarifying post I have ever encountered. The title is honestly amazing, but the text of the post, instead of a straightforward argument that the title promises, is an extremely dense and almost meta-fictional dialogue about the title: I think we probably should prosecute good lying more than bad lying, though of course that's tricky. I'd argue the same is true for other forms of conflict: passive aggression is worse than overt aggression, maybe, probably. I haven't written the post yet to figure it out, but it seems important to know. 2) Fire codes are the root of all evil Fire accidents seem to have the unique combination of producing extremely strong emotional responses by people in a local community, while also often being traceable to an o-ring like failure that you can over-index on. Also, fire marshals are the closest [...] ---Outline:(00:18) 1) Blatant conflicts are the best kind(01:11) 2) Fire codes are the root of all evil(02:14) 3) It is extremely easy to get people to vouch for you, this makes public character references not very helpful(02:54) 4) Public criticism need not pass the ITT of the people critiqued(03:41) 5) Courts are amazing[... 5 more sections]--- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/MqgwHJ93pJpaeHXs6/10-posts-i-don-t-have-time-to-write --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

Apr 23, 2026

9m

828

"$50 million a year for a 10% chance to ban ASI" by Andrea_Miotti, Alex Amadori, Gabriel Alfour

ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action. We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this in the next few years. To be more precise: conditional on receiving this funding in the next few months, we feel we would have ~10% probability of success. In this post, we lay out some of the reasoning behind this estimate, and explain how additional funding past that threshold would continue to significantly improve our chances of success, with $500 million a year producing an estimated ~30% probability of success. [1] Preventing ASI 101 Negotiating, implementing and enforcing an international prohibition on ASI is, in and of itself, not the work of a single non-profit. You [...] ---Outline:(01:17) Preventing ASI 101(05:44) Awareness is the bottleneck(09:38) An asymmetric war(12:08) Scalable processes(17:32) What wed do with $50 million or more per year(18:45) US policy advocacy(21:22) Policy advocacy in the rest of the world(23:37) Public awareness(31:15) Grassroots mobilization(32:31) Policy work(33:59) Thought-leader advocacy(36:05) Attracting and retaining the best talent(37:18) Conclusion The original text contained 28 footnotes which were omitted from this narration. --- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/TnAR5Sf5hphfnzNTr/usd50-million-a-year-for-a-10-chance-to-ban-asi-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 22, 2026

40m

827

"Evil is bad, actually (Vassar and Olivia Schaefer callout post)" by plex

Micheal Vassar's strategy for saving the world is horrifyingly counterproductive. Olivia's is worse. A note before we start: A lot of the sources cited are people who ended up looking kinda insane. This is not a coincidence, it's apparently an explicit strategy: Apply plausibly-deniable psychological pressure to anyone who might speak up until they crack and discredit themselves by sounding crazy or taking extreme and destructive actions. Here's Brent Dill explaining it: (later in the conversation he tries to encourage the person he's talking to kill herself, and threatens her death if she posts the logs. Charming group! I hear Brent was living in Vassar's garden recently, well after he was removed from the wider community for sexual abuse.) Examples Some of the people here I knew before their interactions with Vassar's sphere to be not just mentally OK, but unusually resilient people. Prime among them is Kathy Forth. Prior to her suicide, Kathy and I were friends. I witnessed her falls downwards from healthy and capable to anxiety to paranoia, as downstream of what I believe to be genuine sexual abuse she spiralled into a narrative and way of experiencing the world where almost everyone seemed [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/cY7J7KSSqrhB8t3hQ/evil-is-bad-actually-vassar-and-olivia-schaefer-callout-post --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 21, 2026

15m

826

"10 non-boring ways I’ve used AI in the last month" by habryka

I use AI assistance for basically all of my work, for many hours, every day. My colleagues do the same. Recent surveys suggest >50% of Americans have used AI to help with their work in the last week. My architect recently started sending me emails that were clearly ChatGPT generated.[1] Despite that, I know surprisingly little about how other people use AI assitance. Or at least how people who aren't weird AI-influencers sharing their marketing courses on Twitter or LinkedIn use AI. So here is a list of 10 concrete times I have used AI in some at least mildly creative ways, and how that went. 1) Transcribe and summarize every conversation spoken in our team office Using an internal Lightcone application called "Omnilog" we have a microphone in our office that records all of our meetings, transcribes them via ElevenLabs, and uses Pyannote.ai for speaker identification. This was a bunch of work and is quite valuable, but probably a bit too annoying for most readers of this post to set up. However, the thing I am successfully using Claude Code to do is take that transcript (which often has substantial transcription and speaker-identification errors), clean it up, summarize [...] ---Outline:(00:50) 1) Transcribe and summarize every conversation spoken in our team office(01:56) 2) Try to automatically fix any simple bugs that anyone on the team has mentioned out loud, or complained about in Slack(03:13) 3) Design 20+ different design variations for nowinners.ai(04:09) 4) Review my LessWrong essays for factual accuracy and argue with me about their central thesis(05:08) 5) Remove unnecessary clauses, sentences, parentheticals and random cruft from my LessWrong posts before publishing(06:23) 6) Pair vibe-coding(08:14) 7) Mass-creating 100+ variations of Suno songs using Claude Cowork desktop control[... 3 more sections]--- First published: April 20th, 2026 Source: https://www.lesswrong.com/posts/bxdwSZYxKmPBres6w/10-non-boring-ways-i-ve-used-ai-in-the-last-month --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 21, 2026

13m

825

"Feel like a room has bad vibes? The lighting is probably too “spiky” or too blue" by habryka

I have now had a few years of experience doing architectural and interior design for many spaces that people seem to really love (most widely known Lighthaven, but before that we also had the Lightcone Offices, though I've also played a hand in designing some of the most popular areas at Constellation a few years back). Most people (including me a few years back) have surprisingly bad introspective access into why a room makes them feel certain things. Most of the time, people's ability to describe the effect of a space on them is as shallow as "this place feels artificial", or "this place has bad vibes", or "this place feels cozy". And if they try to figure out why that is true, they quickly run into limits of their introspective access. The most common reason why a space feels bad, is because it is lit by low-quality lights. Our eyes evolved to see things illuminated by sunlight. Correspondingly, it appears that the best proxy we have for whether the light in a room "works" is how similar the light in that room is to natural sunlight. The most popular way of measuring how much light differs from [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/dWib7qinqymfxevE4/feel-like-a-room-has-bad-vibes-the-lighting-is-probably-too --- Narrated by TYPE III AUDIO.

Apr 21, 2026

6m

824

"Quality Matters Most When Stakes are Highest" by LawrenceC

Or, the end of the world is no excuse for sloppy work One morning when I was nine, my dad called me over to his computer. He wanted to show me this amazing Korean scientist who had managed to clone stem cells, and who was developing treatments to let people with spinal cord injuries – people like my dad – walk again on their own two legs. I don't remember exactly what he said next, or what I said back. I have a sense that I was excited too, and that I was upset when I learned the United States had banned this kind of research. Unfortunately, his research didn’t pan out. No such treatment arrived. My dad still walks on crutches. Years later, I learned that the scientist, Hwang Woo-Suk, had been exposed as a fraud. In 2004, Hwang published a paper in Science claiming that his team had cloned a human embryo and derived stem cells from it (the first time anyone had done this). A year later, in 2005, he published a second paper claiming that they managed to repeat this feat eleven more times, producing 11 patient-specific stem cell lines for patients with type 1 [...] --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/GNjDC6jtjr2iiE45i/quality-matters-most-when-stakes-are-highest --- Narrated by TYPE III AUDIO.

Apr 20, 2026

5m

823

"Reevaluating AGI Ruin in 2026" by lc

It's been about four years since Eliezer Yudkowsky published AGI Ruin: A List of Lethalities, a 43-point list of reasons the default outcome from building AGI is everyone dying. A week later, Paul Christiano replied with Where I Agree and Disagree with Eliezer, signing on to about half the list and pushing back on most of the rest. For people who were young and not in the bay area, like me, these essays were probably more significant than old timers would expect. Before it became completely consumed with AI discussions, LessWrong was a forum about the art of human rationality, and most internet rationalists I knew thought of it as a mix between that and a place to write for people who liked the sequences. It wasn't until 2022 that we were exposed to all of the doom arguments in one place, and it was the first time in many years that Eliezer had publicly announced how much more dire his assessment was since the Sequences. As far as I can tell AGI Ruin still remains his most authoritative explanation of his views. It's not often that public intellectuals will literally hand you a document explaining why [...] ---Outline:(02:51) AGI Ruin(02:54) Section A (Setting up the problem)(12:18) Section B.1 (Distributional Shift)(22:16) Section B.2: Central difficulties of outer and inner alignment.(32:21) Section B.3: Central difficulties of sufficiently good and useful transparency / interpretability.(41:29) Section C (What is AI Safety currently doing?)(44:34) Overall Impressions The original text contained 4 footnotes which were omitted from this narration. --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/PgJYwnN7fZKipgMz4/reevaluating-agi-ruin-in-2026 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 20, 2026

49m

822

"Having OCD is like living in North Korea (Here’s how I escaped)" by Declan Molony

[Author's note: this post is the narrative version that explains my journey with OCD and how I treated it. The short version provides quick, actionable advice for treating OCD.] The following is the most painful experience I've ever had. Four years ago in the parking lot of my rock climbing gym… …my heart was pumping out of my chest, I was sweating profusely, and an overwhelming sense of panic and impending doom had a vice grip on my soul. A painful death was surely imminent. I felt like I was defusing a bomb that was on the verge of exploding. In reality, I was standing outside of my car after locking it with only one *beep* of my key fob, instead of my normal 5-6 *beeps* I usually do. The reason I undertook this (basically suicidal) task was because it was getting annoying how many more *beeps* it was taking for my car to feel locked. It used to be only 2-3 *beeps* a few years ago. Now it was 5-6. In a few more years, it might take as many as 10-20 *beeps*. One time on a hike with friends, a sense of panic overcame me. [...] ---Outline:(00:24) The following is the most painful experience Ive ever had.(07:34) OCD(12:28) Deconstructing OCD into its two parts(12:49) (1) Severe Anxiety(15:53) (2) Disordered Thoughts(18:06) My dating life(23:34) So what caused me to finally get help with my OCD?(26:22) Solutions(28:39) Panic Meditation(41:31) Three mini-examples of my improvement(41:35) A) Panic at the grocery store (and no, sadly, not Panic! At The Disco)(42:30) B) Moral OCD at the gym(43:49) C) Disordered thoughts while on a date(50:35) Where Im at today(58:41) Further resources --- First published: April 18th, 2026 Source: https://www.lesswrong.com/posts/fgDqnwQj3AP9mKRRG/having-ocd-is-like-living-in-north-korea-here-s-how-i --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 19, 2026

58m

821

"There are only four skills: design, technical, management and physical" by habryka

Epistemic status: Completely schizo galaxy-brained theory Lightcone[1] operates on a "generalist" philosophy. Most of our full-time staff have the title "generalist", and in any given year they work on a wide variety of tasks — from software development on the LessWrong codebase to fixing an overflowing toilet at Lighthaven, our 30,000 sq. ft. campus. One of our core rules is that you should not delegate a task you don't know how to perform yourself. This is a very intense rule and has lots of implications about how we operate, so I've spent a lot of time watching people learn things they didn't previously know how to do. My overall observation (and why we have the rule) is that smart people can learn almost anything. Across a wide range of tasks, most of the variance in performance is explained by general intelligence (foremost) and conscientiousness (secondmost), not expertise. Of course, if you compare yourself to someone who's done a task thousands of times you'll lag behind for a while — but people plateau surprisingly quickly. Having worked with experts across many industries, and having dabbled in the literature around skill transfer and training, there seems to be little difference [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: April 18th, 2026 Source: https://www.lesswrong.com/posts/KRLGxCaqdgrotyB8z/there-are-only-four-skills-design-technical-management-and --- Narrated by TYPE III AUDIO.

Apr 19, 2026

10m

820

"Meaningful Questions Have Return Types" by Drake Morrison

One way intellectual progress stalls is when you are asking the Wrong Questions. Your question is nonsensical, or cuts against the way reality works. Sometimes you can avoid this by learning more about how the world works, which implicitly answers some question you had, but if you want to make real progress you have to develop the skill of Righting a Wrong Question. This is a classic, old-school rationalist idea. The standard examples are asking about determinism, or free will, or consciousness. The standard fix is to go meta. Ask yourself, "Why do I feel like I have free will" or "Why do I think I have consciousness" which is by itself an answerable question. There is some causal path through your cognition that generates that question, and can be investigated. This works great for some ideas, and can help people untangle some self-referential knots they get themselves into, but I find it unsatisfying. Sometimes I want to know the answer to the real question I had, and going meta avoids it, or asks a meaningfully different question instead of answering it. Over time, I've stumbled across another way to right wrong questions that I find myself using more [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/emsDJNmxBu8Tt6PHt/meaningful-questions-have-return-types --- Narrated by TYPE III AUDIO.

Apr 19, 2026

5m

819

"Carpathia Day" by Drake Morrison

(The better telling is here. Seriously you should go read it. I've heard this story told in rationalist circles, but there wasn't a post on LessWrong, so I made one) Today is April 15th, Carpathia Day. Take a moment to put forth an unreasonable effort to save a little piece of your world, when no one would fault you for doing less. In the early morning of April 15, the RMS Titanic began to sink with more than two thousand souls on board. Over 58 nautical miles away — too far to make it in time — sailed the RMS Carpathia, a small, slow, passenger steamer. The wireless operator, Harold Cottam, was listening to the transmitter late at night before he went to bed when he got a message from Cape Cod intended for the Titanic. When he contacted the Titanic to relay the messages, he got back a distress signal saying they hit an iceberg and were in need of immediate assistance. Cottam ran the message straight to the captain's cabin, waking him. Captain Arthur Rostron's first reaction upon being awoken was anger, but that anger dissolved as he came to understand the situation. Before he'd [...] The original text contained 1 footnote which was omitted from this narration. --- First published: April 15th, 2026 Source: https://www.lesswrong.com/posts/SARCiTFJfXJJhpej7/carpathia-day --- Narrated by TYPE III AUDIO.

Apr 18, 2026

3m

818

"Let goodness conquer all that it can defend" by habryka

Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post, so today we are re-inventing modernism. In my post yesterday, I said: Maybe the most important way ambitious, smart, and wise people leave the world worse off than they found it is by seeing correctly how some part of the world is broken and unifying various powers under a banner to fix that problem — only for the thing they have built to slip from their grasp and, in its collapse, destroy much more than anything previously could have. I think many people very reasonably understood me to be giving a general warning against centralization and power-accumulation. While that is where some of my thoughts while writing the post went to, I would like to now expand on its antithesis, both for my own benefit, and for the benefit of the reader who might have been left confused after yesterday's post. The other day I was arguing with Eliezer about a bunch of related thoughts and feelings. In that context, he said to me: From my perspective, my whole life has been, when you raise the banner to oppose the apocalypse, crazy [...] The original text contained 1 footnote which was omitted from this narration. --- First published: April 16th, 2026 Source: https://www.lesswrong.com/posts/w3MJcDueo77D3Ldta/let-goodness-conquer-all-that-it-can-defend --- Narrated by TYPE III AUDIO.

Apr 18, 2026

11m

817

"Do not conquer what you cannot defend" by habryka

Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism. Once upon a time there was a great king. He ruled his kingdom with wisdom and economically literate policies, and prosperity followed. Seeing this, the citizens of nearby kingdoms revolted against their leaders, and organized to join the kingdom of this great king. While the kingdom's ability to defend itself against external threats grew with each person who joined the land, the kingdom's ability to defend itself against internal threats did not. One fateful evening, the king bit into a bologna sandwich poisoned by a rival noble. That noble quickly proceeded to behead his political enemies in the name of the dead king. The flag bearing the wise king's portrait known as "the great unifier" still flies in the fortified cities where his successor rules with an iron fist. Once upon a time there was a great scientific mind. She developed a new theoretical framework that made large advances on the hardest scientific questions of the day. Seeing the promise of her work, new graduate students, professors, and corporate R&D teams flocked into the field, hungry to [...] --- First published: April 15th, 2026 Source: https://www.lesswrong.com/posts/jinzzbPHshif8nmnw/do-not-conquer-what-you-cannot-defend --- Narrated by TYPE III AUDIO.

Apr 16, 2026

10m

816

"Nectome: All That I Know" by Raelifin

TLDR: I flew to Oregon to investigate Nectome, a brain preservation startup, and talk to their entire team. They’re an ambitious company, looking to grow in a way that no cryonics organization has before. Their procedure is probably much better at saving people than other orgs, and is being offered for as little as $20k until the end of April — a (theoretical) 92% discount. (I bought two.) This early-bird pricing is low, in part, due to some severe uncertainties, in both the broader world and in Nectome's ability to succeed as a business. Meta: I'm Max Harms, an AI alignment researcher at MIRI and author.This deep-dive only assumes functionalism and a passing familiarity with cryonics, but no particular knowledge of Nectome.I have been a cryonics enthusiast for my whole adult life, and that is probably biasing my views, at least a little. I want Nectome to succeed.That said, I am also a rationalist, and I have worked very hard to set aside my wishful thinking and see things with cold objectivity.Throughout the essay, I've attached explicit probabilities for my claims in parentheticals. You can click these probabilities to access Manifold markets so we [...] ---Outline:(02:04) 1. The Problem[... 24 more sections]--- First published: April 15th, 2026 Source: https://www.lesswrong.com/posts/3i5GMhpGbDwef9Rns/nectome-all-that-i-know --- Narrated by TYPE III AUDIO. ---Images from the article:

Apr 16, 2026

1h 20m

815

"Current AIs seem pretty misaligned to me" by ryan_greenblatt

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree. Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't easy to programmatically check. Also, when I apply AIs to very difficult tasks in long-running agentic scaffolds, it's quite common for them to reward-hack / cheat (depending on the exact task distribution)—and they don't make the cheating clear in their outputs. AIs typically don't flag these cheats when doing further work on the same project and often don't flag these cheats even when interacting with a user who would obviously want to know, probably both because the AI doing further work is itself misaligned and because it [...] ---Outline:(09:20) Why is this misalignment problematic?(13:50) How much should we expect this to improve by default?(14:51) Some predictions(16:44) What misalignment have I seen?(40:04) Are these issues less bad in Opus 4.6 relative to Opus 4.5?(42:16) Are these issues less bad in Mythos Preview? (Speculation)(45:54) Misalignment reported by others(46:45) The relationship of these issues with AI psychosis and things like AI psychosis(48:19) Appendix: This misalignment would differentially slow safety research and make a handoff to AIs unsafe(51:22) Appendix: Heading towards Slopolis(55:30) Appendix: Apparent-success-seeking (or similar types of misalignment) could lead to takeover(59:16) Appendix: More on what will happen by default and implications of commercial incentives to fix these issues(01:03:20) Appendix: Can we get out useful work despite these issues with inference-time measures (e.g., critiques by a reviewer)? The original text contained 14 footnotes which were omitted from this narration. --- First published: April 15th, 2026 Source: https://www.lesswrong.com/posts/WewsByywWNhX9rtwi/current-ais-seem-pretty-misaligned-to-me --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podc

Apr 15, 2026

1h 05m

814

"Annoyingly Principled People, and what befalls them" by Raemon

Here are two beliefs that are sort of haunting me right now: Folk who try to push people to uphold principles (whether established ones or novel ones), are kinda an important bedrock of civilization.Also, those people are really annoying and often, like, a little bit crazy And these both feel fairly important. I’ve learned a lot from people who have some kind of hobbyhorse about how society is treating something as okay/fine, when it's not okay/fine. When they first started complaining about it, I’d be like “why is X such a big deal to you?”. Then a few years later I’ve thought about it more and I’m like “okay, yep, yes X is a big deal”. Some examples of X, including noticing that… people are casually saying they will do stuff, and then not doing it.someone makes a joke about doing something that's kinda immoral, and everyone laughs, and no one seems to quite be registering “but that was kinda immoral.”people in a social group are systematically not saying certain things (say, for political reasons), and this is creating weird blind spots for newcomers to the community and maybe old-timers too.someone (or a group) has [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/xG9Y2Mct7uZyt98yb/annoyingly-principled-people-and-what-befalls-them --- Narrated by TYPE III AUDIO.

Apr 15, 2026

7m

813

"Morale" by J Bostock

One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your morale is low, you can't push through adversity. It's also very easy to accidentally drop your morale through standard rationalist life-optimization. It's easy to optimize for wellbeing and miss out on the factors which affect morale, especially if you're working on something important, like not having everyone die. One example is working at an office that feeds you three meals per day. This seems optimal: eating is nice, and cooking is effort. Obvious choice. Example But morale doesn't come from having nice things. Consider a rich teenager. He gets basically every material need satisfied: maids clean, chefs cook, his family takes him on holiday four times a year. What happens when this kid comes up against something really difficult in school? He probably doesn't push through. "Aha", I hear you say. "That kid has never faced adversity. Of course he's not going to handle it well." Ok, suppose he gets kicked in the shins every day and called a posh twat by some local youths, but still goes into school. That's adversity, will that work? Will [...] ---Outline:(00:48) Example(01:55) II(03:19) III --- First published: April 12th, 2026 Source: https://www.lesswrong.com/posts/53ZAzbdzGJHGeE5rs/morale --- Narrated by TYPE III AUDIO.

Apr 14, 2026

4m

812

"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt

It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the oversight signal. In more powerful systems, this kind of failure would jeopardize safely navigating the intelligence explosion. It's crucial to build good processes to ensure development is executed according to plan, especially as human oversight becomes spread thin over increasing amounts of potentially untrusted and sloppy AI labor. This particular failure is also directly harmful, because it significantly reduces our confidence that the model's reasoning trace is monitorable (reflective of the AI's intent to misbehave).[1] I'm grateful that Anthropic has transparently reported on this issue as much as they have, allowing for outside scrutiny. I want to encourage them to continue to do so. Thanks to Carlo Leonardo Attubato, Buck Shlegeris, Fabien Roger, Arun Jose, and Aniket Chakravorty for feedback and discussion. See also previous discussion here. Incidents A technical error affecting Mythos, Opus 4.6, and Sonnet 4.6 This is the most recent incident. In the Claude Mythos alignment risk update, Anthropic report having accidentally exposed approximately 8% [...] ---Outline:(01:21) Incidents[... 6 more sections]--- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/K8FxfK9GmJfiAhgcT/anthropic-repeatedly-accidentally-trained-against-the-cot --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or anoth

Apr 14, 2026

11m

811

"The policy surrounding Mythos marks an irreversible power shift" by sil

This post assumes Anthropic isn't lying: Mythos is the current SOTAMythos is potent[1]Anthropic will not make it publicly available un-nerfed[2]Anthropic will have a select few companies use it as part of project glasswing[3] to improve cybersecurity or whatever Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4] Since Mythos, this has no longer been the case and I don't think it will ever happen again. It may happen for a short period of time if an entity with a policy differing significantly from Anthropic develops a SOTA model.[5] However, most serious competitors (OpenAI, Google), don't have policies differing vastly from Anthropic, and thus I can't imagine a SOTA model (more potent than Mythos) being released unrestricted to the public soon. To be clear, I am not claiming the public will never have access to a model as strong as Mythos, this seems almost certainly false, I am claiming that the public will probably never have access to the SOTA of that time. Glasswing makes it clear that the attitude among top large companies - those in power [...] The original text contained 8 footnotes which were omitted from this narration. --- First published: April 12th, 2026 Source: https://www.lesswrong.com/posts/3MhJELzwpbR42xsJ3/the-policy-surrounding-mythos-marks-an-irreversible-power --- Narrated by TYPE III AUDIO.

Apr 14, 2026

3m

810

"Only Law Can Prevent Extinction" by Eliezer Yudkowsky

There's a quote I read as a kid that stuck with me my whole life: "Remember that all tax revenue is the result of holding a gun to somebody's head. Not paying taxes is against the law. If you don’t pay taxes, you’ll be fined. If you don’t pay the fine, you’ll be jailed. If you try to escape from jail, you’ll be shot." -- P. J. O'Rourke. At first I took away the libertarian lesson: Government is violence. It may, in some cases, be rightful violence. But it all rests on violence; never forget that. Today I do think there's an important distinction between two different shapes of violence. It's a distinction that may make my fellow old-school classical Heinlein liberaltarians roll up their eyes about how there's no deep moral difference. I still hold it to be important. In a high-functioning ideal state -- not all actual countries -- the state's violence is predictable and avoidable, and meant to be predicted and avoided. As part of that predictability, it comes from a limited number of specially licensed sources. You're supposed to know that you can just pay your taxes, and then not get shot. Is [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/5CfBDiQNg9upfipWk/only-law-can-prevent-extinction --- Narrated by TYPE III AUDIO. ---Images from the article: 99% do you start sawing off your own leg" that's not how this works bro.". Eliezer Yudkowsky replies with an image showing a blue and purple cartoon dinosaur screaming with text reading "AAAAA" and "AAAA" on a brown background." style="max-width: 100%;" />

Apr 14, 2026

38m

809

"Dario probably doesn’t believe in superintelligence" by RobertM

Epistemic status: I think this is true but don't think this post is a very strong argument for the case, or particularly interesting to read. But I had to get 500 words out! I think the 2013 conversation is interesting reading as a piece of history, separate from the top-level question, and recommend reading that. I think many people have a relationship with Anthropic that is premised on a false belief: that Dario Amodei believes in superintelligence. What do I mean by "believes" in superintelligence? Roughly speaking, that the returns to intelligence past the human level are large, in terms of the additional affordances they would grant for steering the world, and that it is practical to get that additional intelligence into a system. There are many pieces of evidence which suggest this, going quite far back. In 2013, Dario was one of two science advisors (along with Jacob Steinhardt) that Holden brought along to a discussion with Eliezer and Luke about MIRI strategy. A transcript of the conversation is here. It is the first piece of public communication I can find from Dario on the subject. Read end-to-end, I don't think it strongly supports my titular claim. However [...] --- First published: April 10th, 2026 Source: https://www.lesswrong.com/posts/Fnty2JpQ6WBD9FWo5/dario-probably-doesn-t-believe-in-superintelligence --- Narrated by TYPE III AUDIO.

Apr 13, 2026

12m

808

"Daycare illnesses" by Nina Panickssery

Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opinion. Then I started mentioning the idea to various people. Every parent I spoke to brought up a consideration I hadn’t thought about before—the illnesses. A number of parents, including family members, told me they had sent their baby to daycare only for them to become constantly ill, sometimes severely, until they decided to take them out. This worried me so I asked around some more. Invariably every single parent who had tried to send their babies or toddlers to daycare, or who had babies in daycare right now, told me that they were ill more often than not. One mother strongly advised me never to send my baby to daycare. She regretted sending her (normal and healthy) first son to daycare when he was one—he ended up hospitalized with severe pneumonia after a few months of constant illnesses and infections. She told me that after that she didn’t send her other kids to daycare and they had much healthier childhoods. I also started paying more attention to the kids I [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/byiLDrbj8MNzoHZkL/daycare-illnesses --- Narrated by TYPE III AUDIO. ---Images from the article:

Apr 13, 2026

10m

807

"If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines" by ryan_greenblatt

Anthropic's system card for Mythos Preview says: It's unclear how we should interpret this. What do they mean by productivity uplift? To what extent is Anthropic's institutional view that the uplift is 4x? (Like, what do they mean by "We take this seriously and it is consistent with our own internal experience of the model.") One straightforward interpretation is: AI systems improve the productivity of Anthropic so much that Anthropic would be indifferent between the current situation and a situation where all of their technical employees magically work 4 hours for every 1 hour (at equal productivity without burnout) but they get zero AI assistance. In other words, AI assistance is as useful as having their employees operate at 4x faster speeds for all activities (meetings, coding, thinking, writing, etc.) I'll call this "4x serial labor acceleration" [1] (see here for more discussion of this idea [2] ). I currently think it's very unlikely that Anthropic's AIs are yielding 4x serial labor acceleration, but if I did come to believe it was true, I would update towards radically shorter timelines. (I tentatively think my median to Automated Coder would go from 4 years from now to [...] ---Outline:(08:21) Appendix: Estimating AI progress speed up from serial labor acceleration(11:00) Appendix: Different notions of uplift The original text contained 4 footnotes which were omitted from this narration. --- First published: April 10th, 2026 Source: https://www.lesswrong.com/posts/Jga7PHMzfZf4fbdyo/if-mythos-actually-made-anthropic-employees-4x-more --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 12, 2026

13m

806

"Do not be surprised if LessWrong gets hacked" by RobertM

Or, for that matter, anything else. This post is meant to be two things: a PSA about LessWrong's current security posture, from a LessWrong admin[1]an attempt to establish common knowledge of the security situation it looks like the world (and, by extension, you) will shortly be in Claude Mythos was announced yesterday. That announcement came with a blog post from Anthropic's Frontier Red Team, detailing the large number of zero-days (and other security vulnerabilities) discovered by Mythos. This should not be a surprise if you were paying attention - LLMs being trained on coding first was a big hint, the labs putting cybersecurity as a top-level item in their threat models and evals was another, and frankly this blog post maybe could've been written a couple months ago (either this or this might've been sufficient). But it seems quite overdetermined now. LessWrong's security posture In the past, I have tried to communicate that LessWrong should not be treated as a platform with a hardened security posture. LessWrong is run by a small team. Our operational philosophy is similar to that of many early-stage startups. We treat some LessWrong data as private in a social sense, but do [...] ---Outline:(01:04) LessWrongs security posture(02:03) LessWrong is not a high-value target(04:11) FAQ(04:29) The Broader Situation The original text contained 6 footnotes which were omitted from this narration. --- First published: April 8th, 2026 Source: https://www.lesswrong.com/posts/2wi5mCLSkZo2ky32p/do-not-be-surprised-if-lesswrong-gets-hacked --- Narrated by TYPE III AUDIO.

Apr 9, 2026

7m

"Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving

"MATS 9 Retrospective & Advice" by beyarkay

"The primary sources of near-term cybersecurity risk" by lc

"The Owned Ones" by Eliezer Yudkowsky

"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied

"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

"x-risk-themed" by kave

"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

[Linkpost] "Interpreting Language Model Parameters" by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

"It’s nice of you to worry about me, but I really do have a life" by Viliam

"Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI" by Eliezer Yudkowsky

"Dairy cows make their misery expensive (but their calves can’t)" by Elizabeth

"Takes from two months as an aspiring LLM naturalist" by AnnaSalamon

"Intelligence Dissolves Privacy" by Vaniver

"How Go Players Disempower Themselves to AI" by Ashe Vazquez Nuñez

"On today’s panel with Bernie Sanders" by David Scott Krueger

"Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”" by LawrenceC

"llm assistant personas seem increasingly incoherent (some subjective observations)" by nostalgebraist

"LessWrong Shows You Social Signals Before the Comment" by TurnTrout

"Update on the Alex Bores campaign" by Eric Neyman

"Community misconduct disputes are not about facts" by mingyuan

"The paper that killed deep learning theory" by LawrenceC

"Forecasting is Way Overrated, and We Should Stop Funding It" by mabramov

"Your Supplies Probably Won’t Be Stolen in a Disaster" by jefftk

"10 posts I don’t have time to write" by habryka

"$50 million a year for a 10% chance to ban ASI" by Andrea_Miotti, Alex Amadori, Gabriel Alfour

"Evil is bad, actually (Vassar and Olivia Schaefer callout post)" by plex

"10 non-boring ways I’ve used AI in the last month" by habryka

"Feel like a room has bad vibes? The lighting is probably too “spiky” or too blue" by habryka

"Quality Matters Most When Stakes are Highest" by LawrenceC

"Reevaluating AGI Ruin in 2026" by lc

"Having OCD is like living in North Korea (Here’s how I escaped)" by Declan Molony

"There are only four skills: design, technical, management and physical" by habryka

"Meaningful Questions Have Return Types" by Drake Morrison

"Carpathia Day" by Drake Morrison

"Let goodness conquer all that it can defend" by habryka

"Do not conquer what you cannot defend" by habryka

"Nectome: All That I Know" by Raelifin

"Current AIs seem pretty misaligned to me" by ryan_greenblatt

"Annoyingly Principled People, and what befalls them" by Raemon

"Morale" by J Bostock

"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt

"The policy surrounding Mythos marks an irreversible power shift" by sil

"Only Law Can Prevent Extinction" by Eliezer Yudkowsky

"Dario probably doesn’t believe in superintelligence" by RobertM

"Daycare illnesses" by Nina Panickssery

"If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines" by ryan_greenblatt

"Do not be surprised if LessWrong gets hacked" by RobertM

Authentication Required