Alignment Newsletter #25 by Rohin Shah
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towa...
First published
11/17/2021
Genres:
education
Listen to this episode
Summary
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towards a New Impact Measure (Alex Turner): This post introduces a new idea for an impact measure. It defines impact as change in our ability to achieve goals. So, to measure impact, we can simply measure how much easier or harder it is to achieve goals -- this gives us Attainable Utility Preservation (AUP). This will penalize actions that restrict our ability to reach particular outcomes (opportunity cost) as well as ones that enlarge them (instrumental convergence). Alex then attempts to formalize this. For every action, the impact of that action is the absolute difference between attainable utility after the action, and attainable utility if the agent takes no action. Here, attainable utility is calculated as the sum of expected Q-values (over m steps) of every computable utility function (weighted by 2^{-length of description}). For a plan, we sum up the penalties for each action in the plan. (This is not entirely precise, but you'll have to read the post for the math.) We can then choose one canonical action, calculate its impact, and allow the agent to have impact equivalent to at most N of these actions. He then shows some examples, both theoretical and empirical. The empirical ones are done on the suite of examples from AI safety gridworlds used to test relative reachability. Since the utility functions here are indicators for each possible state, AUP is penalizing changes in your ability to reach states. Since you can never increase the number of states you reach, you are penalizing decrease in ability to reach states, which is exactly what relative reachability does, so it's not surprising that it succeeds on the environments where relative reachability succeeded. It does have the additional feature of handling shutdowns, which relative reachability doesn't do. Since changes in probability of shutdown drastically change the attainable utility, any such changes will be heavily penalized. We can use this dynamic to our advantage, for example by committing to shut down the agent if we see it doing something we disapprove of. My opinion: This is quite a big improvement for impact measures -- it meets many desiderata that weren't satisfied simultaneously before. My main critique is that it's not clear to me that an AUP-agent would be able to do anything useful. For example, perhaps the action used to define the impact unit is well-understood and accepted, but any other action makes humans a little bit more likely to turn off the agent. Then the agent won't be able to take those actions. Generally, I think that it's hard to satisfy the conjunction of three desiderata -- objectivity (no dependence on values), safety (preventing any catastrophic plans) and non-trivialness (the AI is still able to do some useful things). There's a lot more discussion in the comments. Realism about rationality (Richard Ngo): In the same way that moral realism claims that there is one true morality (even though we may not know it yet), rationality realism is the claim that there is one "correct" algorithm for rationality or intelligence. This post argues that many disagreements can be traced back to differences on how much one identifies with the rationality realism mindset. For example, people who agree with rationality realism are more likely to think that there is a simple theoretical framework that captures intelligence, that there is an "ideal" decision theory, that certain types of moral reasoning are "correct", that having contradictory preferences or beliefs is really bad, etc. The author's skepticism about this mindset also makes them skeptical about agent foundations research. My opinion: This does feel like an important generator of many disagreements I've had. I'd split rationality real...
Duration
Parent Podcast
The Nonlinear Library: Alignment Section
View PodcastSimilar Episodes
What is the alternative to intent alignment called? Q by Richard Ngo
Release Date: 11/17/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is the alternative to intent alignment called? Q, published by Richard Ngo on the AI Alignment Forum. Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)? Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation? (Intent alignment definition from) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
AMA: Paul Christiano, alignment researcher by Paul Christiano
Release Date: 12/06/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA: Paul Christiano, alignment researcher, published by Paul Christiano on the AI Alignment Forum. I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1). If you want to ask something just post a top-level comment; I'll spend at least a day answering questions. You can find some background about me here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
AI alignment landscape by Paul Christiano
Release Date: 11/19/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI alignment landscape, published byPaul Christiano on the AI Alignment Forum. Here (link) is a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions. Here is the main overview slide from the talk: The highlighted boxes are where I spend most of my time. Here are the full slides from the talk. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
Announcing the Alignment Research Center by Paul Christiano
Release Date: 11/19/2021
Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the Alignment Research Center, published by on the AI Alignment Forum. (Cross-post from ai-alignment.com) I’m now working full-time on the Alignment Research Center (ARC), a new non-profit focused on intent alignment research. I left OpenAI at the end of January and I’ve spent the last few months planning, doing some theoretical research, doing some logistical set-up, and taking time off. For now it’s just me, focusing on theoretical research. I’m currently feeling pretty optimistic about this work: I think there’s a good chance that it will yield big alignment improvements within the next few years, and a good chance that those improvements will be integrated into practice at leading ML labs. My current goal is to build a small team working productively on theory. I’m not yet sure how we’ll approach hiring, but if you’re potentially interested in joining you can fill out this tiny form to get notified when we’re ready. Over the medium term (and maybe starting quite soon) I also expect to implement and study techniques that emerge from theoretical work, to help ML labs adopt alignment techniques, and to work on alignment forecasting and strategy. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Explicit: No
Similar Podcasts
The Nonlinear Library
Release Date: 10/07/2021
Authors: The Nonlinear Fund
Description: The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Explicit: No
Break Room Chats
Release Date: 08/06/2020
Authors: LLAMA NPS
Description: Welcome to Break Room Chats, the podcast of the New Professionals Section of the Library Leadership and Management Association of ALA. To continue the conversation, please visit the New Professionals Section Facebook page at https://www.facebook.com/groups/LLAMA.NPS/ . We're also on twitter @LLAMA_NPS.
Explicit: No
The Marketing Detective
Release Date: 09/25/2020
Authors: Mitch West
Description: Solving the marketing mysteries that challenge local business across linear and nonlinear platforms.
Explicit: No
Test Nonlinear
Release Date: 05/19/2021
Description: Test RSS podcast feed for Nonlinear Tech Internship
Explicit: No
Section Cut
Release Date: 09/10/2020
Authors: Section Cut
Description: SectionCut is a curated collection of DIGITAL and ANALOG design resources and HOW TO's to assist with complex workflows acting as a cross section of design information. As a cross section of highly valued design information, S|C provides a glimpse into the contemporary designer's library of resources.
Explicit: No
Ryan’s Ramble
Release Date: 08/12/2020
Authors: Ryan Aratin
Description: Nonlinear podcast about ideas or experiences with in my life.
Explicit: No
Nonlinear
Release Date: 12/17/2020
Authors: Teal
Description: Everyone's career path is different, built by pivotal moments and choices. We're on a mission to amplify those stories and examine how our decisions shape our careers. Nonlinear is hosted by Dave Fano, Founder & CEO of Teal—a genuinely consumer-first platform designed to help people grow and manage their careers. Our goal is to empower people to land jobs they love with free tools that guide and automate the process. Learn more at tealhq.com.
Explicit: No
The Noisy Library by Story Stitchers
Release Date: 04/28/2021
Authors: Story Stitchers
Description: A place for stories to be found and voices to be heard.The Noisy Library is online space curated by Story Stitchers CIC to showcase stories and voices. We welcome any kind of story to our library whether it is a poem, memory, song, family saga or an imagined place.Here you can find the stories for grown ups. We think that this section on the library is suitable for audiences aged 13 plus if you are under 13 , you should check out out children’s Library
Explicit: No
Nonfiction Friends
Release Date: 08/29/2020
Authors: Osceola Library System
Description: Nonfiction Friends is an educational, library-based podcast that seeks to help the public learn interesting, quirky, and sometimes bizarre facts that can be found in the nonfiction section of their local library.Follow us on Twitter!https://twitter.com/nffriendscast
Explicit: No
Oxford Union Library Audio Tour
Release Date: 09/01/2020
Authors: Oxford University
Description: An audio tour of the historic Oxford Union Library. Since its foundation, the Union has maintained a library for the use of its members. One of the largest lending libraries in Oxford, it is of particular relevance to students studying Classics, English, History, Law, PPE and Theology. In recent years, the Science section has been expanded. The library has a significant collection of 19th century publications, both books and journals, which would be of special interest to researchers studying that period.
Explicit: No
Dun Laoghaire Rathdown County Library
Release Date: 08/18/2020
Description: The Library Section of Dun Laoghaire Rathdown County Council run a varied programme of literary events throughout the year. This podcast series provides an archive of some of these events and helps to extend their reach to a wider audience.
Explicit: No
Symmetry, Bifurcation and Multi-Agent Decision Making - Naomi Leonard 26th April 2019
Release Date: 08/07/2021
Description: I will present nonlinear dynamics for distributed decision-making that derive from principles of symmetry and bifurcation. Inspired by studies of animal groups, including house-hunting honeybees and schooling fish, the nonlinear dynamics describe a group of interacting agents that can manage flexibility as well as stability in response to a changing environment.
Explicit: No