EPISODE · Apr 30, 2026 · 3 MIN
“Notes on Transformer Consciousness” by slavachalnev
Assuming transformers can have conscious experience, what would that experience be like? Transformers[1] are a structured grid of layers and token positions and we can use this structure to reason about their internal experience. Epistemic status: very speculative. I've ordered this writeup approximately by how much I've thought it through and how much I believe it. Decode vs Prefill I claim that the experience of decode is identical to the experience of prefill. To see why this is the case, picture a transformer that's generating a token at a time and zoom in on a single layer at the latest token position. There are two major components: the MLP, which processes this position's residual stream, and the attention block, which can "see" all the previous positions at this layer. So in the diagram below, each position in the transformer only has access to what happened in the lower left box. Now let's take the trace we just computed and pass it into the transformer as prefill. The activations will, of course, all be the same as during decode, but the way they are computed will also be the same. From the perspective of the MLP and the attention block [...] ---Outline:(00:32) Decode vs Prefill(01:50) KV-Cached experience(02:12) Layers The original text contained 2 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/awhDsBnaGJdhKz2iE/notes-on-transformer-consciousness --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Notes on Transformer Consciousness” by slavachalnev
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m