just now

Alignment Newsletter #25 by Rohin Shah

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towa...

Listen to this episode

0:00 / 0:00

Summary

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towards a New Impact Measure (Alex Turner): This post introduces a new idea for an impact measure. It defines impact as change in our ability to achieve goals. So, to measure impact, we can simply measure how much easier or harder it is to achieve goals -- this gives us Attainable Utility Preservation (AUP). This will penalize actions that restrict our ability to reach particular outcomes (opportunity cost) as well as ones that enlarge them (instrumental convergence). Alex then attempts to formalize this. For every action, the impact of that action is the absolute difference between attainable utility after the action, and attainable utility if the agent takes no action. Here, attainable utility is calculated as the sum of expected Q-values (over m steps) of every computable utility function (weighted by 2^{-length of description}). For a plan, we sum up the penalties for each action in the plan. (This is not entirely precise, but you'll have to read the post for the math.) We can then choose one canonical action, calculate its impact, and allow the agent to have impact equivalent to at most N of these actions. He then shows some examples, both theoretical and empirical. The empirical ones are done on the suite of examples from AI safety gridworlds used to test relative reachability. Since the utility functions here are indicators for each possible state, AUP is penalizing changes in your ability to reach states. Since you can never increase the number of states you reach, you are penalizing decrease in ability to reach states, which is exactly what relative reachability does, so it's not surprising that it succeeds on the environments where relative reachability succeeded. It does have the additional feature of handling shutdowns, which relative reachability doesn't do. Since changes in probability of shutdown drastically change the attainable utility, any such changes will be heavily penalized. We can use this dynamic to our advantage, for example by committing to shut down the agent if we see it doing something we disapprove of. My opinion: This is quite a big improvement for impact measures -- it meets many desiderata that weren't satisfied simultaneously before. My main critique is that it's not clear to me that an AUP-agent would be able to do anything useful. For example, perhaps the action used to define the impact unit is well-understood and accepted, but any other action makes humans a little bit more likely to turn off the agent. Then the agent won't be able to take those actions. Generally, I think that it's hard to satisfy the conjunction of three desiderata -- objectivity (no dependence on values), safety (preventing any catastrophic plans) and non-trivialness (the AI is still able to do some useful things). There's a lot more discussion in the comments. Realism about rationality (Richard Ngo): In the same way that moral realism claims that there is one true morality (even though we may not know it yet), rationality realism is the claim that there is one "correct" algorithm for rationality or intelligence. This post argues that many disagreements can be traced back to differences on how much one identifies with the rationality realism mindset. For example, people who agree with rationality realism are more likely to think that there is a simple theoretical framework that captures intelligence, that there is an "ideal" decision theory, that certain types of moral reasoning are "correct", that having contradictory preferences or beliefs is really bad, etc. The author's skepticism about this mindset also makes them skeptical about agent foundations research. My opinion: This does feel like an important generator of many disagreements I've had. I'd split rationality real...

First published

11/17/2021

Genres

education

Duration

16 minutes

Parent Podcast

The Nonlinear Library: Alignment Section

View Podcast

Share this episode

Similar Episodes

  • What is the alternative to intent alignment called? Q by Richard Ngo

    11/17/2021

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Clean
  • AMA: Paul Christiano, alignment researcher by Paul Christiano

    12/06/2021

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Clean
  • AI alignment landscape by Paul Christiano

    11/19/2021

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Clean
  • Announcing the Alignment Research Center by Paul Christiano

    11/19/2021

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the Alignment Research Center, published by on the AI Alignment Forum. (Cross-post from ai-alignment.com) I’m now working full-time on the Alignment Research Center (ARC), a new non-profit focused on intent alignment research. I left OpenAI at the end of January and I’ve spent the last few months planning, doing some theoretical research, doing some logistical set-up, and taking time off. For now it’s just me, focusing on theoretical research. I’m currently feeling pretty optimistic about this work: I think there’s a good chance that it will yield big alignment improvements within the next few years, and a good chance that those improvements will be integrated into practice at leading ML labs. My current goal is to build a small team working productively on theory. I’m not yet sure how we’ll approach hiring, but if you’re potentially interested in joining you can fill out this tiny form to get notified when we’re ready. Over the medium term (and maybe starting quite soon) I also expect to implement and study techniques that emerge from theoretical work, to help ML labs adopt alignment techniques, and to work on alignment forecasting and strategy. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Clean

Similar Podcasts

  • Muhammad West

    08/12/2020

    Muslim Central

    Muhammad West, FREE Audio Podcast brought to you by Muslim Central. Muslim Central is a private Audio Podcast Publisher. Our Audio Library consists of Islamic Lectures, Interviews, Debates and more, with over 100 Speakers and Shows from around the World.

    Clean
  • Rock The Walls

    08/12/2020

    idobi Network

    Always on the frontlines, Rock The Walls is hosted by music fan and devoted radio host Patrick Walford. Over one thousand interviews are already in the can—from being the first ever radio interview for bands like I Prevail & The Story So Far, to speaking with heavy & alternative music legends such as The Used, Anthrax, Parkway Drive, Godsmack, Korn, Sum 41, Bring Me The Horizon, A Day To Remember, and hundreds more.After doing the show for over a decade, hosting Warped Radio, bringing you your idobi Music News, and Music Directing idobi Howl, along with hitting the road for coverage on the Warped at Sea Cruise in 2017 & the final Vans Warped Tour in 2018, Walford is a long trusted voice in the music scene. Tune in to hear in-depth interviews you won't hear anywhere else with all your favorite heavy & alternative artists, along with spinning the best in new music.

    Clean
  • The Orchard Grove Podcast

    08/12/2020

    Orchard Grove Community Church

    Sunday service messages from Orchard Grove Community Church

    Clean
  • The TAPE Podcast Network

    08/12/2020

    TAPE

    Podcasts devised, developed and delivered by volunteers at community arts charity, TAPE Community Music and Film.

    Clean
  • Brewers Coverage

    08/12/2020

    Audacy

    Best. Brewers. Coverage.Listen to the FAN On Deck Show before every game and then, after the last pitch make the switch - to The FAN Milwaukee Baseball Post Game Show, hosted by Tim Allen! Hear the latest from our baseball insiders and players here, too!

    Clean
  • PODCASTS - WELCOME TO HILLSIDE

    08/12/2020

    First Evangelical Free Church of Tahlequah

    Come hear the Word of God preached at EFreeTahlequah anywhere in the world.

    Clean
  • Grace Chicago Church

    08/12/2020

    Grace Chicago Church

    Sermons based on the weekly lectionary from a reformed church in the heart of Chicago.

    Clean
  • Shepherd of the Valley Bible Church

    08/12/2020

    Tommy Moon

    Weekly messages from the pastor at Shepherd of the Valley Bible Church in Hood River, Oregon.

    Clean
  • Lost in America

    08/12/2020

    World's Smartest Podcast Network

    Why does the President of Belarus always imprison his political opponents? What does the UAE-Israel Deal mean for the Arab World? Comedians Turner Sparks and Michael Ira Kaplan turn comics stationed around the globe into embedded reporters so you can know what's really going on.

    Clean
  • Disciples Church Sermon Audio

    08/12/2020

    Disciples Church

    At Disciples Church our vision is to invite people to hear God's gospel about how He is reconciling and sanctifying people with Himself through the power and saving grace of Jesus Christ. Our hope is that you the listener would learn how to: Enjoy God, Grow with in Community, and Love others through the Good News of the Gospel. The Disciples Church podcast includes weekly Messages from our preaching pastor Joshua Kirstine. For more information on the ministries at Disciples Church visit: www.discipleschurch.com

    Clean
  • Dawg Pound Daily Podcast on the Cleveland Browns

    08/12/2020

    FanSided

    FanSided's Dawg Pound Daily Podcast discusses the latest Cleveland Browns news, analysis and more from the staff at DawgPoundDaily.com.

    Clean
  • Lucky Words

    08/12/2020

    Jeffrey Windsor

    A weekly* email newsletter about literature, art, walking or riding or just sitting in the mountains or the desert of the American southwest, and poetry. luckywords.substack.com

    Clean

Episode Description

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towards a New Impact Measure (Alex Turner): This post introduces a new idea for an impact measure. It defines impact as change in our ability to achieve goals. So, to measure impact, we can simply measure how much easier or harder it is to achieve goals -- this gives us Attainable Utility Preservation (AUP). This will penalize actions that restrict our ability to reach particular outcomes (opportunity cost) as well as ones that enlarge them (instrumental convergence). Alex then attempts to formalize this. For every action, the impact of that action is the absolute difference between attainable utility after the action, and attainable utility if the agent takes no action. Here, attainable utility is calculated as the sum of expected Q-values (over m steps) of every computable utility function (weighted by 2^{-length of description}). For a plan, we sum up the penalties for each action in the plan. (This is not entirely precise, but you'll have to read the post for the math.) We can then choose one canonical action, calculate its impact, and allow the agent to have impact equivalent to at most N of these actions. He then shows some examples, both theoretical and empirical. The empirical ones are done on the suite of examples from AI safety gridworlds used to test relative reachability. Since the utility functions here are indicators for each possible state, AUP is penalizing changes in your ability to reach states. Since you can never increase the number of states you reach, you are penalizing decrease in ability to reach states, which is exactly what relative reachability does, so it's not surprising that it succeeds on the environments where relative reachability succeeded. It does have the additional feature of handling shutdowns, which relative reachability doesn't do. Since changes in probability of shutdown drastically change the attainable utility, any such changes will be heavily penalized. We can use this dynamic to our advantage, for example by committing to shut down the agent if we see it doing something we disapprove of. My opinion: This is quite a big improvement for impact measures -- it meets many desiderata that weren't satisfied simultaneously before. My main critique is that it's not clear to me that an AUP-agent would be able to do anything useful. For example, perhaps the action used to define the impact unit is well-understood and accepted, but any other action makes humans a little bit more likely to turn off the agent. Then the agent won't be able to take those actions. Generally, I think that it's hard to satisfy the conjunction of three desiderata -- objectivity (no dependence on values), safety (preventing any catastrophic plans) and non-trivialness (the AI is still able to do some useful things). There's a lot more discussion in the comments. Realism about rationality (Richard Ngo): In the same way that moral realism claims that there is one true morality (even though we may not know it yet), rationality realism is the claim that there is one "correct" algorithm for rationality or intelligence. This post argues that many disagreements can be traced back to differences on how much one identifies with the rationality realism mindset. For example, people who agree with rationality realism are more likely to think that there is a simple theoretical framework that captures intelligence, that there is an "ideal" decision theory, that certain types of moral reasoning are "correct", that having contradictory preferences or beliefs is really bad, etc. The author's skepticism about this mindset also makes them skeptical about agent foundations research. My opinion: This does feel like an important generator of many disagreements I've had. I'd split rationality real...

Comments

Sign in to leave a comment.

Loading comments...