EPISODE · Jun 22, 2026 · 32 MIN
“A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart
Summary We've been building a theory of how prompt injections work under the hood.We show it comes down to how LLMs perceive roles (the humble chat template tags).We use this theory to create new attacks, explain some weird mech interp results, and predict when attacks work.We also advocate for a new subfield focused on the science of roles, and sketch some unexplored new research problems.Work supported by CBAI and Cosmos. Another version of this post (with more inline colors) is here, and full ICML paper here. 1. The World to an LLM How does an LLM know the difference between its own thoughts and someone else's words? To see why this is hard, let's look at what the world actually looks like to a model. Here's a simple chat where we ask Claude to check the day of the week. I took a snapshot of it midway through its follow-up response: Left = what we see; right = what the LLM gets. On the left is what we see in the chat interface: a structured conversation with distinct turns. On the right is what the model actually receives as input: a single, continuous stream [...] ---Outline:(00:12) Summary(00:54) 1. The World to an LLM(02:35) 2. Roles(05:03) 3. Roles and prompt injection(06:35) Two ways to defend injections(08:14) 4. What's going wrong with roles?(13:28) 5. Spoofing Thoughts(15:59) 6. Prompt Injection as Role Confusion(20:57) 7. Why Roles Matter(21:01) A brief history of roles(22:23) A general theory of roles(24:54) 8. Open Ideas for Roles Research(25:12) Subconscious steering(27:06) When to use roles(28:42) Roles as a cognitive window(30:38) Conclusion The original text contained 27 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/d8xDGzCEYE639qqEv/a-theory-of-prompt-injection-and-why-you-should-study-roles --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m