just now

AF - "Dirty concepts" in AI alignment discourses, and some guesses for how to deal with them by Nora Ammann

<a href="https://www.alignmentforum.org/posts/bBicgqvwjPbaQrJJA/dirty-concepts-in-ai-alignment-discourses-and-some-guesses">Link to original article</a><br/><br/>Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Dirty concepts" in AI alignment discourses, and some guesses for how to deal with them, published by Nora Ammann on August 20, 2023 on The AI Alignment Forum. Meta: This is a short summary & discussion post of a talk on the same topic by Javier Gomez-Lavin, which he gave as part of the PIBBSS speaker series. The speaker series features researchers from both AI Alignment and adjacent fields studying intelligent behavior in some shape or form. The goal is to create a space where we can explore the connections between the work of these scholars and questions in AI Alignment. This post doesn't provide a comprehensive summary of the ideas discussed in the talk, but instead focuses on exploring some possible connections to AI Alignment. For a longer version of Gomez-Levin's ideas, you can check out a talk here. "Dirty concepts" in the Cognitive Sciences Gomez-Lavin argues that cognitive scientists engage in a form of "philosophical laundering," wherein they associate, often implicitly, philosophically loaded concepts (such as volition, agency, etc.) into their concept of "working memory." He refers to such philosophically laundered concepts as "dirty concepts" insofar as they conceal potentially problematic assumptions being made. For instance, if we implicitly assume that working memory requires, for example, volition, we have now stretched our conception of working memory to include all of cognition. But, if we do this, then the concept of working memory loses much of its explanatory power as one mechanism among others underlying cognition as a whole. Often, he claims, cognitive science papers will employ such dirty concepts in the abstract and introduction but will identify a much more specific phenomena being measured in the methods and results section. What to do about it? Gomez-Lavin's suggestion in the case of CogSci The pessimistic response (and some have suggested this) would be to quit using any of these dirty concept (e.g. agency) all together. However, it appears that this would amount to throwing the baby out with the bathwater. To help remedy the problem of dirty concepts in working memory literature, Gomez-Lavin proposes creating an ontology of the various operational definitions of working memory employed in cognitive science by mining a wide range of research articles. The idea is that, instead of insisting that working memory be operationally defined in a single way, we ought to embrace the multiplicity of meanings associated with the term by keeping track of them more explicitly. He refers to this general approach as "productive pessimism." It is pessimistic insofar as it starts from the assumption that dirty concepts are being problematically employed, but it is productive insofar as it attempts to work with this trend rather than fight against it. While it is tricky to reason with those fuzzy concepts, once we are rigorous about proposing working definitions / operationalization of these terms as we use them, we can avoid some of the main pitfalls and improve our definitions over time. Relevance to AI alignment? It seems fairly straightforward that AI alignment discourse, too, suffers from dirty concepts. If this is the case (and we think it is), a similar problem diagnosis (e.g. how dirty concepts can hamper research/intellectual progress) and treatment (e.g. ontology mapping) may apply. A central example here is the notion of "agency". Alignment researchers often speak of AI systems as agents. Yet, there are often multiple, entangled meanings intended when doing so. High-level descriptions of AI x-risk often exploit this ambiguity in order to speak about the problem in general, but ultimately imprecise terms. This is analogous to how cognitive scientists will often describe working memory in general terms in the abstract and operationalize the term ...

First published

08/20/2023

Genres:

education

Listen to this episode

0:00 / 0:00

Summary

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Dirty concepts" in AI alignment discourses, and some guesses for how to deal with them, published by Nora Ammann on August 20, 2023 on The AI Alignment Forum. Meta: This is a short summary & discussion post of a talk on the same topic by Javier Gomez-Lavin, which he gave as part of the PIBBSS speaker series. The speaker series features researchers from both AI Alignment and adjacent fields studying intelligent behavior in some shape or form. The goal is to create a space where we can explore the connections between the work of these scholars and questions in AI Alignment. This post doesn't provide a comprehensive summary of the ideas discussed in the talk, but instead focuses on exploring some possible connections to AI Alignment. For a longer version of Gomez-Levin's ideas, you can check out a talk here. "Dirty concepts" in the Cognitive Sciences Gomez-Lavin argues that cognitive scientists engage in a form of "philosophical laundering," wherein they associate, often implicitly, philosophically loaded concepts (such as volition, agency, etc.) into their concept of "working memory." He refers to such philosophically laundered concepts as "dirty concepts" insofar as they conceal potentially problematic assumptions being made. For instance, if we implicitly assume that working memory requires, for example, volition, we have now stretched our conception of working memory to include all of cognition. But, if we do this, then the concept of working memory loses much of its explanatory power as one mechanism among others underlying cognition as a whole. Often, he claims, cognitive science papers will employ such dirty concepts in the abstract and introduction but will identify a much more specific phenomena being measured in the methods and results section. What to do about it? Gomez-Lavin's suggestion in the case of CogSci The pessimistic response (and some have suggested this) would be to quit using any of these dirty concept (e.g. agency) all together. However, it appears that this would amount to throwing the baby out with the bathwater. To help remedy the problem of dirty concepts in working memory literature, Gomez-Lavin proposes creating an ontology of the various operational definitions of working memory employed in cognitive science by mining a wide range of research articles. The idea is that, instead of insisting that working memory be operationally defined in a single way, we ought to embrace the multiplicity of meanings associated with the term by keeping track of them more explicitly. He refers to this general approach as "productive pessimism." It is pessimistic insofar as it starts from the assumption that dirty concepts are being problematically employed, but it is productive insofar as it attempts to work with this trend rather than fight against it. While it is tricky to reason with those fuzzy concepts, once we are rigorous about proposing working definitions / operationalization of these terms as we use them, we can avoid some of the main pitfalls and improve our definitions over time. Relevance to AI alignment? It seems fairly straightforward that AI alignment discourse, too, suffers from dirty concepts. If this is the case (and we think it is), a similar problem diagnosis (e.g. how dirty concepts can hamper research/intellectual progress) and treatment (e.g. ontology mapping) may apply. A central example here is the notion of "agency". Alignment researchers often speak of AI systems as agents. Yet, there are often multiple, entangled meanings intended when doing so. High-level descriptions of AI x-risk often exploit this ambiguity in order to speak about the problem in general, but ultimately imprecise terms. This is analogous to how cognitive scientists will often describe working memory in general terms in the abstract and operationalize the term ...

Duration

5 minutes

Parent Podcast

The Nonlinear Library: Alignment Forum Daily

View Podcast

Share this episode

Similar Episodes

    AMA: Paul Christiano, alignment researcher by Paul Christiano

    Release Date: 12/06/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    What is the alternative to intent alignment called? Q by Richard Ngo

    Release Date: 11/17/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    AI alignment landscape by Paul Christiano

    Release Date: 11/19/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

    Would an option to publish to AF users only be a useful feature?Q by Richard Ngo

    Release Date: 11/17/2021

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Would an option to publish to AF users only be a useful feature?Q , published by Richard Ngo on the AI Alignment Forum. Right now there are quite a few private safety docs floating around. There's evidently demand for a privacy setting lower than "only people I personally approve", but higher than "anyone on the internet gets to see it". But this means that safety researchers might not see relevant arguments and information. And as the field grows, passing on access to such documents on a personal basis will become even less efficient. My guess is that in most cases, the authors of these documents don't have a problem with other safety researchers seeing them, as long as everyone agrees not to distribute them more widely. One solution could be to have a checkbox for new posts which makes them only visible to verified Alignment Forum users. Would people use this? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    Explicit: No

Similar Podcasts

    The Nonlinear Library

    Release Date: 10/07/2021

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Section

    Release Date: 02/10/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong

    Release Date: 03/03/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong Daily

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: EA Forum Daily

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Forum Weekly

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: EA Forum Weekly

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: LessWrong Weekly

    Release Date: 05/02/2022

    Authors: The Nonlinear Fund

    Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

    Explicit: No

    The Nonlinear Library: Alignment Forum Top Posts

    Release Date: 02/10/2022

    Authors: The Nonlinear Fund

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

    Explicit: No

    The Nonlinear Library: LessWrong Top Posts

    Release Date: 02/15/2022

    Authors: The Nonlinear Fund

    Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

    Explicit: No

    sasodgy

    Release Date: 04/14/2021

    Description: Audio Recordings from the Students Against Sexual Orientation Discrimination (SASOD) Public Forum with Members of Parliament at the National Library in Georgetown, Guyana

    Explicit: No