Humans in Funny Suits

First published

03/07/2015

Genres:

society

philosophy

Summary

Book III: The Machine in the Ghost - Part M: Fragile Purposes - Humans in Funny Suits

Duration

Parent Podcast

Rationality: From AI to Zombies

View Podcast

Share this episode

Similar Episodes

Methods of Rationality in the Time of Hype

Release Date: 10/13/2023

Authors: Thomas Krendl Gilbert and Nathan Lambert

Description: This week, Tom and Nate discuss some of the core and intriguing dynamics of AI. We discuss the history of the rationality movement and where Harry Potter fan fiction fits in, if AI will ever not feel hypey, the do's and don'ts of Sam Altman, and other topics.(Editor note: sorry for some small issues in Nate's audio. That will be fixed in the next episode)Some links that are references:* HP MOR (Harry Potter and the Methods of Rationality). * A tweet referencing Sam Altman's funny (?) profile change.* Nathan's recent post on Interconnects on the job market craziness.

Explicit: No

Details

Alignment Newsletter #25 by Rohin Shah

Release Date: 11/17/2021

Description: Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #25, published by Rohin Shah on the AI Alignment Forum. Highlights Towards a New Impact Measure (Alex Turner): This post introduces a new idea for an impact measure. It defines impact as change in our ability to achieve goals. So, to measure impact, we can simply measure how much easier or harder it is to achieve goals -- this gives us Attainable Utility Preservation (AUP). This will penalize actions that restrict our ability to reach particular outcomes (opportunity cost) as well as ones that enlarge them (instrumental convergence). Alex then attempts to formalize this. For every action, the impact of that action is the absolute difference between attainable utility after the action, and attainable utility if the agent takes no action. Here, attainable utility is calculated as the sum of expected Q-values (over m steps) of every computable utility function (weighted by 2^{-length of description}). For a plan, we sum up the penalties for each action in the plan. (This is not entirely precise, but you'll have to read the post for the math.) We can then choose one canonical action, calculate its impact, and allow the agent to have impact equivalent to at most N of these actions. He then shows some examples, both theoretical and empirical. The empirical ones are done on the suite of examples from AI safety gridworlds used to test relative reachability. Since the utility functions here are indicators for each possible state, AUP is penalizing changes in your ability to reach states. Since you can never increase the number of states you reach, you are penalizing decrease in ability to reach states, which is exactly what relative reachability does, so it's not surprising that it succeeds on the environments where relative reachability succeeded. It does have the additional feature of handling shutdowns, which relative reachability doesn't do. Since changes in probability of shutdown drastically change the attainable utility, any such changes will be heavily penalized. We can use this dynamic to our advantage, for example by committing to shut down the agent if we see it doing something we disapprove of. My opinion: This is quite a big improvement for impact measures -- it meets many desiderata that weren't satisfied simultaneously before. My main critique is that it's not clear to me that an AUP-agent would be able to do anything useful. For example, perhaps the action used to define the impact unit is well-understood and accepted, but any other action makes humans a little bit more likely to turn off the agent. Then the agent won't be able to take those actions. Generally, I think that it's hard to satisfy the conjunction of three desiderata -- objectivity (no dependence on values), safety (preventing any catastrophic plans) and non-trivialness (the AI is still able to do some useful things). There's a lot more discussion in the comments. Realism about rationality (Richard Ngo): In the same way that moral realism claims that there is one true morality (even though we may not know it yet), rationality realism is the claim that there is one "correct" algorithm for rationality or intelligence. This post argues that many disagreements can be traced back to differences on how much one identifies with the rationality realism mindset. For example, people who agree with rationality realism are more likely to think that there is a simple theoretical framework that captures intelligence, that there is an "ideal" decision theory, that certain types of moral reasoning are "correct", that having contradictory preferences or beliefs is really bad, etc. The author's skepticism about this mindset also makes them skeptical about agent foundations research. My opinion: This does feel like an important generator of many disagreements I've had. I'd split rationality real...

Explicit: No

Details

Rationality and The English Language

First published

Genres:

Summary

Duration

Parent Podcast

Share this episode

Similar Episodes

Methods of Rationality in the Time of Hype

Alignment Newsletter #25 by Rohin Shah

Rationality and The English Language

The Scales of Justice, the Notebook of Rationality

Similar Podcasts

Rationality: From AI to Zombies - The Podcast

Rationality: From AI to Zombies - The Podcast

ZombiesHeroes.com: Movie & TV Show Reviews - Zombies Heroes

LIBRARY OF THE LIVING DEAD

Rational Chidiya

Rationality

Rationality

Rationality

The Zombie iPocalypse

Zombies To Zoinks!

Zombie Beach Podcast

Age Verification