AF - Agency from a causal perspective by Tom Everitt

<a href="https://www.alignmentforum.org/posts/Qi77Tu3ehdacAbBBe/agency-from-a-causal-perspective">Link to original article</a><br/><br/>Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Agency from a causal perspective, published by Tom Everitt on June 30, 2023 on The AI Alignment Forum. Post 3 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction and Post 2: Causality. By Matt MacDermott, James Fox, Rhys Ward, Jonathan Richens, and Tom Everitt representing the Causal Incentives Working Group. Thanks also to Ryan Carey, Toby Shevlane, and Aliya Ahmad. The purpose of this post is twofold: to lay the foundation for subsequent posts by exploring what agency means from a causal perspective, and to sketch a research program for a deeper understanding of agency. The Importance of Understanding Agency Agency is a complex concept that has been studied from multiple perspectives, including social science, philosophy, and AI research. Broadly it refers to a system able to act autonomously. For the purposes of this blog post, we interpret agency as goal-directedness, i.e. acting as if trying to direct the world in some particular direction. There are strong incentives to create more agentic AI systems. Such systems could potentially do many tasks humans are currently needed for, such as independently researching topics, or even run their own companies. However, making systems more agentic comes with an additional set of potential dangers and harms, as goal-directed AI systems could become capable adversaries if their goals are misaligned with human interest. A better understanding of agency may let us: Understand dangers and harms from powerful machine learning systems. Evaluate whether a particular ML model is dangerously agentic. Design systems that are not agentic, such as AGI scientists or oracles, or which are agentic in a safe way. Lay a foundation for progress on other AGI safety topics, such as interpretability, incentives, and generalisation. Preserve human agency, e.g. through a better understanding of the conditions under which agency is enhanced or diminished. Degrees of freedom (Goal-directed) agents come in all shapes and sizes – from bacteria to humans, from football teams to governments, and from RL policies to LLM simulacra – but they share some fundamental features. First, an agent needs the freedom to choose between a set of options. We don’t need to assume that this decision is free from causal influence, or that we can’t make any prediction about it in advance – but there does need to be a sense in which it could either go one way or another. Dennett calls this degrees of freedom. For example, Mr Jones can choose to turn his sprinkler on or not. We can model his decision as a random variable with “watering” and “not watering” as possible outcomes: Freedom comes in degrees. A thermostat can only choose heater output, while most humans have access to a range of physical and verbal actions. Influence Second, in order to be relevant, an agent’s behaviour must have consequences. Mr Jones decision to turn on the sprinkler affects how green his grass becomes: The amount of influence varies between different agents. For example, a language model’s influence will heavily depend on whether it only interacts with its own developers, or with millions of users through a public API. Suggested measures of influence include (causal) channel capacity, performative power, and power in Markov decision processes. Adaptation Third, and most importantly, goal-directed agents do things for reasons. That is, (they act as if) they have preferences about the world, and these preferences drive their behaviour: Mr Jones turns on the sprinkler because it makes the grass green. If the grass didn’t need water, then Mr Jones likely wouldn’t water it. The consequences drive the behaviour. This feedback loop, or backwards causality, can be represented by adding a so-called mechanism node to each object-level node in the original graph. The mechanism n...

First published

06/30/2023

Genres:

education

Listen to this episode

0:00 / 0:00

Summary

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Agency from a causal perspective, published by Tom Everitt on June 30, 2023 on The AI Alignment Forum. Post 3 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction and Post 2: Causality. By Matt MacDermott, James Fox, Rhys Ward, Jonathan Richens, and Tom Everitt representing the Causal Incentives Working Group. Thanks also to Ryan Carey, Toby Shevlane, and Aliya Ahmad. The purpose of this post is twofold: to lay the foundation for subsequent posts by exploring what agency means from a causal perspective, and to sketch a research program for a deeper understanding of agency. The Importance of Understanding Agency Agency is a complex concept that has been studied from multiple perspectives, including social science, philosophy, and AI research. Broadly it refers to a system able to act autonomously. For the purposes of this blog post, we interpret agency as goal-directedness, i.e. acting as if trying to direct the world in some particular direction. There are strong incentives to create more agentic AI systems. Such systems could potentially do many tasks humans are currently needed for, such as independently researching topics, or even run their own companies. However, making systems more agentic comes with an additional set of potential dangers and harms, as goal-directed AI systems could become capable adversaries if their goals are misaligned with human interest. A better understanding of agency may let us: Understand dangers and harms from powerful machine learning systems. Evaluate whether a particular ML model is dangerously agentic. Design systems that are not agentic, such as AGI scientists or oracles, or which are agentic in a safe way. Lay a foundation for progress on other AGI safety topics, such as interpretability, incentives, and generalisation. Preserve human agency, e.g. through a better understanding of the conditions under which agency is enhanced or diminished. Degrees of freedom (Goal-directed) agents come in all shapes and sizes – from bacteria to humans, from football teams to governments, and from RL policies to LLM simulacra – but they share some fundamental features. First, an agent needs the freedom to choose between a set of options. We don’t need to assume that this decision is free from causal influence, or that we can’t make any prediction about it in advance – but there does need to be a sense in which it could either go one way or another. Dennett calls this degrees of freedom. For example, Mr Jones can choose to turn his sprinkler on or not. We can model his decision as a random variable with “watering” and “not watering” as possible outcomes: Freedom comes in degrees. A thermostat can only choose heater output, while most humans have access to a range of physical and verbal actions. Influence Second, in order to be relevant, an agent’s behaviour must have consequences. Mr Jones decision to turn on the sprinkler affects how green his grass becomes: The amount of influence varies between different agents. For example, a language model’s influence will heavily depend on whether it only interacts with its own developers, or with millions of users through a public API. Suggested measures of influence include (causal) channel capacity, performative power, and power in Markov decision processes. Adaptation Third, and most importantly, goal-directed agents do things for reasons. That is, (they act as if) they have preferences about the world, and these preferences drive their behaviour: Mr Jones turns on the sprinkler because it makes the grass green. If the grass didn’t need water, then Mr Jones likely wouldn’t water it. The consequences drive the behaviour. This feedback loop, or backwards causality, can be represented by adding a so-called mechanism node to each object-level node in the original graph. The mechanism n...

Duration

11 minutes

Parent Podcast

The Nonlinear Library: Alignment Forum Daily

View Podcast

Share this episode

Similar Episodes

AMA: Paul Christiano, alignment researcher by Paul Christiano

Release Date: 12/06/2021

Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Explicit: No

Details

What is the alternative to intent alignment called? Q by Richard Ngo

Release Date: 11/17/2021

Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Explicit: No

Details

AI alignment landscape by Paul Christiano

Release Date: 11/19/2021

Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Explicit: No

Details

Would an option to publish to AF users only be a useful feature?Q by Richard Ngo

Release Date: 11/17/2021

Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Would an option to publish to AF users only be a useful feature?Q , published by Richard Ngo on the AI Alignment Forum. Right now there are quite a few private safety docs floating around. There's evidently demand for a privacy setting lower than "only people I personally approve", but higher than "anyone on the internet gets to see it". But this means that safety researchers might not see relevant arguments and information. And as the field grows, passing on access to such documents on a personal basis will become even less efficient. My guess is that in most cases, the authors of these documents don't have a problem with other safety researchers seeing them, as long as everyone agrees not to distribute them more widely. One solution could be to have a checkbox for new posts which makes them only visible to verified Alignment Forum users. Would people use this? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Explicit: No

Details