EPISODE · Jun 24, 2026 · 9 MIN
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
Abstract We make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility. These AIs would (for example) choose $40 for sure over a half-chance of $100 and a half-chance of $0. We argue that risk aversion can preserve AIs’ usefulness in the event that they turn out aligned, and that it provides an extra line of defense in the event that AIs turn out misaligned: misaligned but risk-averse AIs would prefer a higher chance of modest payments to a lower chance of successful rebellion, so in many circumstances we could pay these AIs not to rebel against us. We sketch out some possible methods of training AIs to be risk-averse, and we give reasons to be cautiously optimistic about these methods’ success. The main reasons are that risk aversion is a broad target and easy to reward accurately. Overall, risk aversion seems like a promising line of defense against threats from misaligned AI. Frontier AI companies should consider trying to make their AIs risk-averse. Introduction Future AIs might turn out misaligned, pursuing goals that their developers don’t intend. Just to make things concrete, let's suppose that they end [...] ---Outline:(00:12) Abstract(01:17) Introduction The original text contained 3 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/Zpsk35WgJRfQ2exjL/risk-averse-ais --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
NOW PLAYING
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
No transcript for this episode yet
Similar Episodes
Dec 20, 2021 ·0m