EPISODE · Apr 8, 2026 · 6 MIN
“Role-playing vs Self-modelling” by Jan_Kulveit
In a recent debate on Twitter – which I recommend reading in full – David Chalmers argues: "Claude doesn't role-play the assistant, it realizes the assistant. Role-playing and realization are quite distinct phenomena, even at the level of behavior and function." Jack Lindsey questions this, pointing out evidence in the opposite direction: "I'm curious what you'd say it's doing when it's sampling tokens on the user turn, or, say, on John F. Kennedy's turn in a transcript like: H: When were you born? John F. Kennedy: I was born in 1917. It feels a bit odd to say that the model is realizing JFK? Or perhaps you'd say it's realizing "its conception of JFK" or something like that? That starts to sound a lot like "roleplaying JFK" If the Assistant is distinct from JFK, do you think it's because post-training breaks the symmetry between the Assistant and other characters? This is intuitively plausible, but ultimately it's an empirical question whether this takes place, and there's a lot of empirical evidence that challenges this intuition. Or do you think it's because the Assistant, unlike JFK, has never been anything other than a construct of the LLM, and so [...] ---Outline:(01:56) Symmetry breaking(02:55) Different sources of self-models(05:08) Difference in internal representations(06:01) Summary The original text contained 2 footnotes which were omitted from this narration. --- First published: April 7th, 2026 Source: https://www.lesswrong.com/posts/wGn9LXYAbzoJKXyyu/role-playing-vs-self-modelling --- Narrated by TYPE III AUDIO.
NOW PLAYING
“Role-playing vs Self-modelling” by Jan_Kulveit
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m