AF - The Cave Allegory Revisited: Understanding GPT's Worldview by Jan Kulveit
<a href="https://www.alignmentforum.org/posts/kFCu3batN8k8mwtmh/the-cave-allegory-revisited-understanding-gpt-s-worldview">Link to original article</a><br/><br/>Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cave Allegory Revisited: Understanding GPT's Worldview, published by Jan Kulveit on February 14, 2023 on The AI Alignment Forum. A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don't have deeper technical knowledge about large generative models. Plato's allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable. It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave! Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what's outside of the cave. What does the second cave allegory illustrate? The first insight from the allegory is: if you are in GPT's place, part of the difficulty in figuring out what's going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world. As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn’t get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation. The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with. At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave .and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn't listen to instructions or produce answers directly - it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works. One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human - you - and a "helpful AI assistant". There is a subtle point to...
First published
02/14/2023
Genres:
education
Listen to this episode
Summary
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cave Allegory Revisited: Understanding GPT's Worldview, published by Jan Kulveit on February 14, 2023 on The AI Alignment Forum. A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don't have deeper technical knowledge about large generative models. Plato's allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable. It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave! Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what's outside of the cave. What does the second cave allegory illustrate? The first insight from the allegory is: if you are in GPT's place, part of the difficulty in figuring out what's going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world. As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn’t get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation. The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with. At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave .and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn't listen to instructions or produce answers directly - it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works. One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human - you - and a "helpful AI assistant". There is a subtle point to...
Duration
5 minutes
Parent Podcast
The Nonlinear Library: Alignment Forum Daily
View PodcastSimilar Episodes
AMA: Paul Christiano, alignment researcher by Paul Christiano
Release Date: 12/06/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
What is the alternative to intent alignment called? Q by Richard Ngo
Release Date: 11/17/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
AI alignment landscape by Paul Christiano
Release Date: 11/19/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
Would an option to publish to AF users only be a useful feature?Q by Richard Ngo
Release Date: 11/17/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Would an option to publish to AF users only be a useful feature?Q , published by Richard Ngo on the AI Alignment Forum. Right now there are quite a few private safety docs floating around. There's evidently demand for a privacy setting lower than "only people I personally approve", but higher than "anyone on the internet gets to see it". But this means that safety researchers might not see relevant arguments and information. And as the field grows, passing on access to such documents on a personal basis will become even less efficient. My guess is that in most cases, the authors of these documents don't have a problem with other safety researchers seeing them, as long as everyone agrees not to distribute them more widely. One solution could be to have a checkbox for new posts which makes them only visible to verified Alignment Forum users. Would people use this? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
Similar Podcasts
The Nonlinear Library
Release Date: 10/07/2021
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: Alignment Section
Release Date: 02/10/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: LessWrong
Release Date: 03/03/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: LessWrong Daily
Release Date: 05/02/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: EA Forum Daily
Release Date: 05/02/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: Alignment Forum Weekly
Release Date: 05/02/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: EA Forum Weekly
Release Date: 05/02/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: LessWrong Weekly
Release Date: 05/02/2022
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
The Nonlinear Library: Alignment Forum Top Posts
Release Date: 02/10/2022
Authors: The Nonlinear Fund
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
Explicit: No
The Nonlinear Library: LessWrong Top Posts
Release Date: 02/15/2022
Authors: The Nonlinear Fund
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
Explicit: No
sasodgy
Release Date: 04/14/2021
Description: Audio Recordings from the Students Against Sexual Orientation Discrimination (SASOD) Public Forum with Members of Parliament at the National Library in Georgetown, Guyana
Explicit: No