just now

AF - Don't accelerate problems you're trying to solve by Andrea Miotti

<a href="https://www.alignmentforum.org/posts/4Pi3WhFb4jPphBzme/don-t-accelerate-problems-you-re-trying-to-solve">Link to original article</a><br/><br/>Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Don't accelerate problems you're trying to solve, published by Andrea Miotti on February 15, 2023 on The AI Alignment Forum. If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad. Though it is obviously bad, there may be circumstances which require it. However, accelerating AGI should require a much higher bar of evidence and much more extreme circumstances than is commonly assumed. There are a few categories of arguments that claim intentionally advancing AI capabilities can be helpful for alignment, which do not meet this bar. Two cases of this argument are as follows It doesn't matter much to do work that pushes capabilities if others are likely to do the same or similar work shortly after. We should avoid capability overhangs, so that people are not surprised. To do so, we should extract as many capabilities as possible from existing AI systems. We address these two arguments directly, arguing that the downsides are much higher than they may appear, and touch on why we believe that merely plausible arguments for advancing AI capabilities aren’t enough. Dangerous argument 1: It doesn't matter much to do work that pushes capabilities if others are likely to do the same or similar work shortly after. For a specific instance of this, see Paul Christiano’s “Thoughts on the impact of RLHF research”: RLHF is just not that important to the bottom line right now. Imitation learning works nearly as well, other hacky techniques can do quite a lot to fix obvious problems [.] RLHF is increasingly important as time goes on, but it also becomes increasingly overdetermined that people would have done it. In general I think your expectation should be that incidental capabilities progress from safety research is a small part of total progress [.] Markets aren’t efficient, they only approach efficiency under heavy competition when people with relevant information put effort into making them efficient. This is true for machine learning, as there aren’t that many machine learning researchers at the cutting edge, and before ChatGPT there wasn’t a ton of market pressure on them. Perhaps something as low hanging as RLHF or something similar would have happened eventually, but this isn’t generally true. Don’t assume that something seemingly obvious to you is obvious to everyone. But even if something like RLHF or imitation learning would have happened eventually, getting small steps of progress slightly earlier can have large downstream effects. Progress often follows an s-curve, which appears exponential until the current research direction is exploited and tapers off. Moving an exponential up, even a little, early on can have large downstream consequences: The red line indicates when the first “lethal” AGI is deployed, and thus a hard deadline for us to solve alignment. A slight increase in progress now can lead to catastrophe significantly earlier! Pushing us up the early progress exponential has really bad downstream effects! And this is dangerous decision theory too: if every alignment researcher took a similar stance, their marginal accelerations would quickly add up. Dangerous Argument 2: We should avoid capability overhangs, so that people are not surprised. To do so, we should extract as many capabilities as possible from existing AI systems. Again, from Paul: Avoiding RLHF at best introduces an important overhang: people will implicitly underestimate the capabilities of AI systems for longer, slowing progress now but leading to faster and more abrupt change later as people realize they’ve been wrong. But there is no clear distinction between eliminating capability overhangs and discovering new capabilities. Eliminating capability overhangs is discovering AI capabilities faste...

First published

02/15/2023

Genres:

education

Listen to this episode

0:00 / 0:00

Summary

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Don't accelerate problems you're trying to solve, published by Andrea Miotti on February 15, 2023 on The AI Alignment Forum. If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad. Though it is obviously bad, there may be circumstances which require it. However, accelerating AGI should require a much higher bar of evidence and much more extreme circumstances than is commonly assumed. There are a few categories of arguments that claim intentionally advancing AI capabilities can be helpful for alignment, which do not meet this bar. Two cases of this argument are as follows It doesn't matter much to do work that pushes capabilities if others are likely to do the same or similar work shortly after. We should avoid capability overhangs, so that people are not surprised. To do so, we should extract as many capabilities as possible from existing AI systems. We address these two arguments directly, arguing that the downsides are much higher than they may appear, and touch on why we believe that merely plausible arguments for advancing AI capabilities aren’t enough. Dangerous argument 1: It doesn't matter much to do work that pushes capabilities if others are likely to do the same or similar work shortly after. For a specific instance of this, see Paul Christiano’s “Thoughts on the impact of RLHF research”: RLHF is just not that important to the bottom line right now. Imitation learning works nearly as well, other hacky techniques can do quite a lot to fix obvious problems [.] RLHF is increasingly important as time goes on, but it also becomes increasingly overdetermined that people would have done it. In general I think your expectation should be that incidental capabilities progress from safety research is a small part of total progress [.] Markets aren’t efficient, they only approach efficiency under heavy competition when people with relevant information put effort into making them efficient. This is true for machine learning, as there aren’t that many machine learning researchers at the cutting edge, and before ChatGPT there wasn’t a ton of market pressure on them. Perhaps something as low hanging as RLHF or something similar would have happened eventually, but this isn’t generally true. Don’t assume that something seemingly obvious to you is obvious to everyone. But even if something like RLHF or imitation learning would have happened eventually, getting small steps of progress slightly earlier can have large downstream effects. Progress often follows an s-curve, which appears exponential until the current research direction is exploited and tapers off. Moving an exponential up, even a little, early on can have large downstream consequences: The red line indicates when the first “lethal” AGI is deployed, and thus a hard deadline for us to solve alignment. A slight increase in progress now can lead to catastrophe significantly earlier! Pushing us up the early progress exponential has really bad downstream effects! And this is dangerous decision theory too: if every alignment researcher took a similar stance, their marginal accelerations would quickly add up. Dangerous Argument 2: We should avoid capability overhangs, so that people are not surprised. To do so, we should extract as many capabilities as possible from existing AI systems. Again, from Paul: Avoiding RLHF at best introduces an important overhang: people will implicitly underestimate the capabilities of AI systems for longer, slowing progress now but leading to faster and more abrupt change later as people realize they’ve been wrong. But there is no clear distinction between eliminating capability overhangs and discovering new capabilities. Eliminating capability overhangs is discovering AI capabilities faste...

Duration

9 minutes

Parent Podcast

The Nonlinear Library: Alignment Forum Weekly

View Podcast

Share this episode

Similar Episodes

    AMA: Paul Christiano, alignment researcher by Paul Christiano

    Release Date: 12/06/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    What is the alternative to intent alignment called? Q by Richard Ngo

    Release Date: 11/17/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    AI alignment landscape by Paul Christiano

    Release Date: 11/19/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    Would an option to publish to AF users only be a useful feature?Q by Richard Ngo

    Release Date: 11/17/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Would an option to publish to AF users only be a useful feature?Q , published by Richard Ngo on the AI Alignment Forum. Right now there are quite a few private safety docs floating around. There's evidently demand for a privacy setting lower than "only people I personally approve", but higher than "anyone on the internet gets to see it". But this means that safety researchers might not see relevant arguments and information. And as the field grows, passing on access to such documents on a personal basis will become even less efficient. My guess is that in most cases, the authors of these documents don't have a problem with other safety researchers seeing them, as long as everyone agrees not to distribute them more widely. One solution could be to have a checkbox for new posts which makes them only visible to verified Alignment Forum users. Would people use this? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

Similar Podcasts

    The Nonlinear Library

    Release Date: 10/07/2021

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Section

    Release Date: 02/10/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong

    Release Date: 03/03/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong Daily

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: EA Forum Daily

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: EA Forum Weekly

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Forum Daily

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong Weekly

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Forum Top Posts

    Release Date: 02/10/2022

    Authors: The Nonlinear Fund

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

    Explicit: No

    The Nonlinear Library: LessWrong Top Posts

    Release Date: 02/15/2022

    Authors: The Nonlinear Fund

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

    Explicit: No

    sasodgy

    Release Date: 04/14/2021

    Description: Audio Recordings from the Students Against Sexual Orientation Discrimination (SASOD) Public Forum with Members of Parliament at the National Library in Georgetown, Guyana

    Explicit: No