EPISODE · Jun 10, 2026 · 52 MIN
“Machinic Psychopharmacology: Do LLMs Self-Medicate?” by Sid Black, Joseph Bloom
Sid Black, Joseph Bloom UK AISI, Model Transparency Team Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges. tl;dr We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each. To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states. Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom). We aim to investigate a few high level research questions: RQ1: Which vectors do the models prefer?RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...] ---Outline:(00:33) tl;dr(01:51) Key findings:(04:20) Introduction(06:49) Motivation(07:54) Methodology(08:05) Position-indexed Steering(09:10) Constructing a Library of Vectors(11:32) Free-Play Evaluation(13:00) Introspection Evaluation(14:20) Free-text guess.(15:23) Prefill+logprob guess.(16:39) Task-Based Evaluation(18:42) Results(18:45) RQ1: Which vectors do the models prefer?(26:03) RQ2: Can the model introspect on what's been applied?(30:36) RQ3. Does the model self-steer under task pressure?(35:20) Related Work(37:47) Discussion and Next Steps(42:04) Bibliography(42:11) Appendix(42:14) A - Free-text Guessing Introspection Results(42:29) B - Selected Transcript Excerpts(42:44) C - Why are some vectors more introspectable than others?(46:29) D - Vector-stacking bigram significance(48:12) E - Story-generation prompts The original text contained 1 footnote which was omitted from this narration. --- First published: June 10th, 2026 Source: https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Machinic Psychopharmacology: Do LLMs Self-Medicate?” by Sid Black, Joseph Bloom
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m