Achieving provably beneficial, human-compatible AI

What this episode covers

AI legend Stuart Russell, the Berkeley professor who leads the Center for Human-Compatible AI, joins Chris to share his insights into the future of artificial intelligence. Stuart is the author of Human Compatible, and the upcoming 4th edition of his perennial classic Artificial Intelligence: A Modern Approach, which is widely regarded as the standard text on AI. After exposing the shortcomings inherent in deep learning, Stuart goes on to propose a new practitioner approach to creating AI that avoids harmful unintended consequences, and offers a path forward towards a future in which humans can safely rely of provably beneficial AI.Sponsors:DigitalOcean – DigitalOcean’s developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99% uptime SLA, and 24/7/365 world-class support to back that up. Get your $100 credit at do.co/changelog. Pluralsight – Stay home. Skill up. For free. Pluralsight is totally free for the entire month of April! With over 7,000 courses from experts in software development, security, cloud and data there’s never been a better time to skill up. Learn more and get started at pluralsight.com/changelogFastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Featuring:Stuart Russell – Website, LinkedInChris Benson – Website, GitHub, LinkedIn, XShow Notes:Stuart Russell on WikipediaStuart Russell’s TED Speaker profileCenter for Human-Compatible AIStuart Russell on how to make AI ‘human-compatible’AI could be a disaster for humanity. A top computer scientist thinks he has the solution.Leading AI Luminary Has An Idea To Ensure Humans Remain In ControlBooks“Artificial Intelligence” by Stuart Russell and Peter Norvig“Human Compatible” by Stuart RussellUpcoming Events: Register for upcoming webinars here!

of MATCHES

TRANSCRIPT · AUTO-GENERATED

I don't think deep learning evolves into AGI. So, AGI artificial general intelligence is not going to be reached by just having bigger deep learning networks and more data. AGI and human intelligence require fundamental capabilities that are just not present in deep learning technology as we currently understand it. Deep learning systems don't know anything.

They can't reason. They can't accumulate knowledge. They can't apply what they learned in one context to solve problems in another context, etc., etc. And these are just elementary things that humans do all the time.

The Amr's Change Log is provided by Fastly. Learn more at facet.com. We move fast and fix things here at Change Log because of Rollbar. Check them out at rollbar.com and we're hosted on Linode Cloud servers head to linode.com slash change log.

This episode is brought to you by DigitalOcean. DigitalOcean's developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99 uptime SLA and 2475 world cloud support to back that up. DigitalOcean makes it easy to deploy, scale, store, secure and monitor your cloud environments.

Head to dio.co slash change log to get started with a $100 credit again dio.co slash change log. Welcome to Practical AI, a weekly podcast that makes artificial intelligence practical, productive and accessible to everyone. This is where conversations around AI, machine learning and design happen. Join the community and slack with us around various topics at the show at change.com slash community and follow us on Twitter if you're at Practical AI FN.

Okay, take it away, Chris. Welcome to another episode of the Practical AI podcast. My name is Chris Benson. I'm a principal AI strategist at Lockheed Martin.

Normally listeners would know that Daniel White-Nack, my co-host would be with me. He is unavailable today. He's out sick. And so I have the pleasure of introducing our guest today who is a legend in the AI field.

This with me today is Stuart Russell, who is a professor of computer science at University of California, Berkeley and holder of the Smith is it? Zadeh chair, if I'm getting that correct. It's Zadeh. Zadeh, I apologize.

And also if the name sounds familiar, he is the author of the standard book on artificial intelligence, which most practitioners in the field will be familiar with, as well as a recent book for a general audience, which is called human compatible artificial intelligence and the problem of control. Stuart, thank you very much for coming on the show. I know I barely touched on it. I know you have been in this field for decades.

If you could tell us just a little bit more about your background before we get fully launching. Yes, as you say, it's been quite a long time. I first started doing AI when I was in high school because I got a programmable calculator and I thought I could make it really intelligent. But it turned out that it only allowed 36 keystrokes in the program.

So I didn't get very far with that attempt. But then I got to use a giant computer at Imperial College. So I go to chess program. That was my first serious AI program.

I did my PhD at Stanford. I joined Berkeley in 86. So it's 34 years teaching at Berkeley. And it's been a pretty interesting time.

And most people would say now is maybe the most exciting time to be doing AI because there's so much progress. We've been able to solve or nearly solve some of the major open problems of the field. Speech recognition, machine translation, certain parts of computer vision are particularly recognizing objects and images. All of those things now work pretty well.

So then we can roll out all those techniques into the real world and do cool things like driving cars and everything like that. So it's lots of fun. We're all very busy. So I guess given that you have seen so much of the evolution of this field over time, could you talk a little bit about what the field was like when you came into it and what technologies were prevalent and tell us a bit about the evolution of the field over the years all the way into the current, what's certainly taken off these last few years.

Sure. Yeah. So in the early years, so I guess I would say I started probably 1975. The focus was almost exclusively on problem solving, game planning, logical reasoning.

So everything was deterministic. So we assumed that we could give the computer perfect knowledge of the problem a perfectly stated goal and then it would come up with a guaranteed solution, whether it was proving a theorem or finding a checkmate or a thought coming up with a plan to deliver a bunch of parcels to a bunch of recipients or whatever it might be. In the 80s, we had the big expert system boom. So initially, a logical rule based system.

So encoding expertise in logical rules. Sometimes we now call it business rules or business intelligence. That's a phrase that they use because the term expert system fell out of favor. But in the 80s, that was a really big, exciting, you know, hype bubble just like today.

Sure. And the beginnings of handling uncertainty, because we wanted to make expert systems that did things like medical diagnosis where there are no hard and fast rules. You have to take the evidence and combine it in, you know, to get some kind of soft prediction, you might say. But that technology largely failed in the real world.

And complicated reasons for it, but I think basically it was not doing uncertain reasoning correctly. And so as you tried to build bigger systems for real problems, the whole thing would fall apart. Every time you had a new rule, the other ones would stop messing things up and the interactions would cause the wrong answers to come out. And companies very quickly stopped investing in this and we had what we call the AI winter where, you know, my AI course was down to about 25 students in 1990.

It's up at about 900 right now. I imagine it is. And that's only because we're not like, you know, the fire marshal won't let us have any more people in the class. It would probably be, you know, 1,200 if not everyone in.

So what happened next in the AI actually was that rigorous, probabilistic methods took over within the field, mainly from the work of utopel starting in the mid 80s. And then machine learning had a renaissance, reinforcement learning had a renaissance. And so from the late 80s until around 2011 or so, there was a pretty solid technical research progress using probability statistics, connections to operations of search control theory became a very mathematical field. Some of the techniques worked pretty well.

You know, so speech recognition became reasonably practical. And the first self driving cars were operating long before the present day and doing so reasonably successfully. You know, there were big applications of planning. There were lots of diagnostic systems and so on.

So it was relatively successful, but it wasn't really until deep learning happened around 2011, 2012 that it really hit the big time in terms of media coverage and excitement and so on. So it was deep learning that enabled us to, for example, beat the human well champion at go combination of deep learning with reinforcement learning methods and game playing techniques that had been around for decades and decades. But deep learning was this extra ingredient that let the system somehow recognize patterns on the go board that allowed it to be the world champion. So now we're in a position where, you know, as I mentioned, thousands of students want to take AI courses.

There are thousands of startup companies. All the big companies have major AI divisions and are using AI in hundreds of applications throughout their businesses. So it's fun. It's exciting.

It maybe does some hype going on too. There might be a little bit. So going through that, it really begs the question for me and I'm going to throw a question at you. I imagine it will be like a, but you're looking at all these different types of artificial intelligence, different things that are being labeled as artificial intelligence, you know, from the, the symbolic logic days through expert systems all the way up through today's deep learning and other associated technologies.

And there are obviously vastly different underpinnings in terms of what they can do and how you arrive at them. How do you define artificial intelligence? What does the term mean to you as the person who has literally written the textbook on the subject? And how has that changed and evolved over those years?

That's a good question. I mean, all of the above, all the things you mentioned, this is all artificial intelligence because it's all in the service of creating machines that can act intelligently, which means really choosing actions that can be expected to achieve their objectives. You know, if you're a self-driving car, the objective is to get to the airport safely, legally, comfortably. And so in order to do that, you need perception, but you also need symbolic planning to choose a route you need, parableistic forecasting to deal with traffic delays and maybe have a backup route just in case, and you need speech recognition in order to interface with the passenger, et cetera, et cetera.

So if you want to build a system that's going to help a mathematician, you can't just throw a bunch of theorems and proofs into a deep learning system and say, here, you learn how to do math, you actually have to have symbolic reasoning, capability, theorem-proven, which the underlying technology of that is symbolic logic and not statistical learning. So it all depends what you want to do. The overriding model, which I think pervades not just AI, but a lot of other disciplines, control theory, operations research, economic statistics, they all have this model, which is that we specify an objective and then the machine finds some optimal solution or a way of achieving the objective, the best solution. And so actually what the book is about, the human compatible book, is basically saying that model is really a terrible model.

Now unfortunately, the first three editions of my textbook actually kind of solidified that model and said, okay, here's how we understand, here's how we pull everything in AI together into a single conceptual framework and you can see all of different kinds of AI researchers or different facets, different ways of looking at that same underlying conceptual framework. The reason I think it's a terrible model is not a new thing, right? It's something we've known for thousands of years, which is we cannot specify our objectives completely incorrectly. And if you look at the legend of King Midas, he specifies his objective, I want everything I touch to go, the gods, or you could say the AI system gives him the objective exactly as he specifies and then of course his food and his drink and his family will turn to gold and he dies, misery, salvation, unexpected consequences.

Right, that's the thing, right? It's always unanticipated consequences, accidental side effects, collateral damage, externalities is what the economists call it, but it's a pervasive problem. We've known about it for a long time. That's why your third wish is always please undo the first two wishes because I've ruined everything, right?

So the human compatible book basically says, okay, we have to throw away that model because up to now it hasn't been that bad because first of all, most of our AI systems were toys. They were in the lab, we were doing demos. It wasn't out there in industry at the level, it's that now. Until recently on a global scale, but now it is, right?

So now you've got the content selection algorithms from all the different social media platforms and those algorithms are machine learning algorithms, but they decide what billions of people spend hours every day reading and watching, right? So in terms of their actual direct, in terms of like you take the number of people times the amount of time, they are more powerful than anything that's ever existed in the history of the human race by far, by far, right? I mean, you know, you think Stalin was powerful, but he got to speak to his people like, you know, maybe half an hour every month or something, right? Yes.

And these algorithms are speaking to 50 times more people for hours every day. A largely oblivious audience for the most part that's acting on them. Yeah. So the audience doesn't know what the algorithms are doing or what they're trying to do.

The algorithms are trying to maximize their objective, which is click through or engagement or something like that. And in the course of doing that, rather than just send you what's interesting, they actually modify you into someone who's more predictable from their point of view, because the more predictable you are, the more money they can make off you. And so whatever you start out as they change you and mold you into a predictable clicker. And so that's what they've done.

And I think, you know, most people would say that the results have been pretty disastrous on the whole. So I want to ask you, there's a particular remark you make in the book that I would ask you. And I think you're already kind of going down the path on this to some degree, but you say we must plan for the possibility that machines will far exceed the human capacity for decision making in the real world. And I think that you've started to address some of the challenges.

Could you give us a little bit more of a holistic perspective on what I mean, that statement has a lot in it right there. Can you talk a little bit about what the implications of that is? So yeah, so let's give an example, right? Suppose that a few years or decades down the line, you're, you know, the CEO or an IT company or a solar power company or whatever it might be, and you want your company to be more successful.

So you, you engage an AI system and you give it the objective of, let's say, you know, maximizing the profits or the revenues in my corporation. And because that system is far more capable than humans are, right, it devises plans that are more successful than all the competitors can be. And so that corporation in the interests of maximizing revenues gradually takes over larger and larger portions of the world economy, you know, and if it's not properly designed, right? If that was the objective, you know, wherever it was feasible, it might end up using slave labor, for example, in order to maximize profits and so on and so forth.

I mean, you can imagine all the ways that corporations have abused humanity in the past, and now we've got one that's much more capable than human beings are. You know, some people actually argue that this is already happening, not from AI, but from corporations that optimize profit at the expense of everything else. So for example, at the expense of the climate, the fossil fuel industry has optimized its profits by sort of multi-decade misinformation strategy that's actually outwitted the human race, right? And so even though the vast majority of experts and economists and scientists say, oh, you know, we need to have a carbon tax, we need to do this, we need to do that, we aren't doing any of it, right?

We're just talking about it. And so effectively, the fossil fuel industry has defeated the human race by superior pursuit of a fixed objective. So it would get much worse than that when AI systems are able to invent and carry out these kinds of strategies. That's even within the realm of things that we currently understand, right?

That you could have corporate strategy, you could enslave people, you could do this, you could do that. But AI systems will come up with things we don't understand. And we, you know, the whole human race could be collateral damage if we don't know how to control the systems that we create. And so far, there's no examples of a dumb species controlling a more intelligent species forever.

I totally agree with that. So for my own employer, I'm actually the person leading on AI ethics. So AI ethics is a huge passion of mine. And obviously you've raised some pretty big concerns there.

And I'm taking a little bit of a tangent. I wasn't really expecting to go down this path. But I am curious how you envision the role of AI ethics in our society and the world at large given everything that you just said. I mean, it clearly the potential for consequences that we did not envision that we did not plan on is fairly significant, especially as technology evolves.

Do you have any thoughts? In a sense, I wish it wasn't called AI ethics. Okay. What should it be called?

Well, so I mean, let's give you an analogy, right? So the nuclear engineers who make sure that nuclear power stations don't explode like Chernobyl, are they ethicists? Would you say that's a nuclear ethics issue? No, I mean, it's just common sense that you don't want your nuclear power station to explode.

It's common sense that you want your AI systems to remain under human control. Sure. But at the moment, under the standard model, they won't remain under human control. And would you talk us through what that implies?

When you say it won't, and I'm going to set it up in this way and that recognizing, and it's funny how many people I talk to have different perspectives from what I think you're about to go. But given the evolution that we've seen over time and the rapid evolution we're seeing and deep learning and whatever follows coming up, that potential for loss of human control, what does that come from? Why is it inevitable in your view? I don't want to say inevitable if we persist with AI within the standard model.

OK. Right, where we fix an objective. Because when you fix an objective, you're basically telling the system, whatever course of action optimizes that objective is the correct thing to do. And in particular, for example, anything that imperils the success of the objective has to be prevented well, what might imperil the success of the objective while being switched off?

Sure. So by giving a system a fixed objective, you've now given it an incentive to protect itself from any attempt to interfere with the objective, from any attempt to switch it off. So as a very typical argument I hear people make, if you kind of go back to asimov's three rules for robotics, and the idea that you can just in a non-probabilistic way, just definitively say, you can't hurt people, that kind of thing, as an underlying thing, given the fact that you have this ever-increasing capability in the AI realm, would it be fair to say that's not a realistic perspective that AI would fundamentally look to circumvent? How do you see that?

Yeah, so as much law as you say, don't take into account the probabilistic perspective. They don't allow for uncertainty. But of course, in the real world, there are always risks. And as a movie and self-driving car would simply stay in the garage.

It would say, I'm sorry, the first law does not allow me to leave the garage because that would expose you to risk of injury or death. So sorry, we're not going anywhere. I love that. That's very funny, actually.

That's true. And if you were out for a walk, it would run around with an umbrella in case of photon from the sun landed on your skin and maybe initiated a little melanoma or something like that. There's a chance that could happen so we have to protect it. So in any kind of real world situation, there are trade-offs.

But one of the things I ask most laws do is they make a start on saying what it is the human's point. One of the things we don't want to be harmed. We don't want to be physically injured. And that's a start because, for example, none of the self-driving cars that are out there right now know that people don't want to be injured.

I understood. They have built in rules that say, well, if there's a pedestrian in front of you and you're going forward, stop. And if you're lucky, they have another rule that says if there's a pedestrian behind you and you're going backwards, stop. But they don't know why.

They don't know that if you run into a person, you can into a kill them and they don't know that the person doesn't want to be injured or killed. And it's that lack of knowledge actually that makes them very brittle because when they get into situations they haven't been prepared for. They haven't the faintest idea of what to do. They don't know which cause of action is good and which is bad.

So the solution that the book proposes actually is to say, look, it doesn't matter how much the human tells you that they want this or they don't want that. There's always going to be residual uncertainty about other preferences the human may have. So if the human says I'd like a cup of coffee, that's not your life's mission. Right?

You know, the robot could say, well, you know, the coffee in this hotel is 15 bucks a cup. Usually you want a cup of coffee, right? Because the machine is uncertain about your trade off between coffee and money. If you're miles from a nearest coffee, you know, the robot might not be sure, you know, do you want to wait two hours for this coffee?

Is it okay if I like trundle off across the desert to the nearest Starbucks and come back two hours later or two weeks later with a cup of coffee? You know, so it would be reasonable again to ask a commission. And you know, if you give it a more important goal, like restoring carbon dioxide levels to pre-industrial concentrations, if that was the only objective, well, you know, one very straightforward solution is just to get rid of all the people. Understood.

Because they are the ones who are producing carbon dioxide. And then you might say, oh, well, I didn't mean that. All right. So wish number two, restore the carbon dioxide, but don't kill anybody.

And then the system says, fine, no problem. We'll just have a multi-decade social media campaign to convince people not to have children. And then the human race will gradually die out. And then carbon dioxide levels will be restored.

And that's great. So what I'm really proposing in the book is actually throw away the standard model or only use a standard model in very restricted circumstances. But in general, have a new model where the objectives are in the human and the machine knows that it doesn't know what they are. Its job is to try to satisfy them, but it knows that it doesn't know what they are.

When you design things that way, and you actually solve that problem, so you can have an algorithm that for that problem specification decides what the machine is going to do, that algorithm produces behaviors that seem to be what we want, namely asking permission. Like, is it OK if I turn the oceans into sulfuric acid in order to restore carbon dioxide levels? And you say, no, not so much. We like those little fishies don't know the oceanic acid.

Right? So we will ask permission. It'll even allow itself to be switched off. So rather than try to protect itself and take steps to prevent interference, it actually welcomes interference because interference by human is a way of gaining information.

One of the things I was wanting to ask as you were discussing that is if you could do that also in the context of as we're looking at AI in the deep learning context of today, anticipating wherever we may be going in the future and with the idea that people talk about AGI, which is artificial general intelligence, which presumably would change the nature of what AI is, maybe, and maybe distinguish how your new proposal would work kind of in both worlds. I mean, if you were even today as we're looking at exceeding human capability by if you have a complex set of tasks, even now we can take the models that we have and have many models, each one addressing an aerospace and working together and they can far outperform what humans could do in a similar complex task. And with the idea also of having AGI where we have models that are, for lack of a bit more capable in themselves, maybe eventually aware, I don't know, if you could talk about what your proposal looks like in that evolving world. I'd love to know.

Sure. So first of all, I should point out that I don't think deep learning evolves into AGI. Okay. Right.

So AGI artificial general intelligence is not going to be reached by just having bigger deep learning networks and more data. AGI and human intelligence require fundamental capabilities that are just not present in deep learning technology as we currently understand it. So deep learning systems don't know anything. They can't reason and they can't accumulate knowledge.

They can't apply what they learned in one context to solve problems in another context, et cetera, et cetera. And these are just elementary things that humans do all the time. A bit of a stepping stone technology of the moment, in a sense, deep learning. Well, I think deep learning is one of the pieces, but, you know, so is symbolic logic.

So is probabilistic reasoning. So is sequential decision-making techniques planning, hierarchical reinforcement learning, probabilistic programming, et cetera, et cetera. There's lots of pieces of the puzzle, some of which have been lying around for a long time. Deep learning is just the newest, the most shiniest one.

So everyone's like, ooh, look, you know, but in the 80s, people were going, ooh, look, expert systems. And similar claims were being made, right? That we just, you know, if you just like scale up the number of rules by a factor of 500, you know, and you had like learned people, you know, making quantitative estimates like, oh, yeah, we would need about 500,000 rules to manage a military campaign and stuff like this, like, just complete drivel. Yeah, and there's a lot of drivel being talked now about deep learning.

But okay, so within the context of just straightforward supervised learning, let's say for image classification. Okay. Right. So how does it work?

Well, we have training data and then we have deep learning, which is basically a giant tunable circuit with billions of tunable connection strengths, like little tiny volume controls. And we just tweak all those volume controls in this huge circuit until the thing that comes out the other end is the correct classification of an image. So in Cisco learning, what you do is you, you're supposed to specify the loss function, which says if you classify an object of type A as an object of type B. So let's say you class a picture of a dog and you classify as a cat.

How bad is that? Right. So almost everybody in this business uses what we call a uniform loss function, which means that they say every error is equally bad, because that's how the competitions work. He penalizes you for the number of errors you make, not how bad the errors are.

Right. So, you know, for example, in an image net, there are two categories of dog one, well, there's one hundred and something categories, but two of them are the Norfolk Terrier and the Norwich Terrier. Right. And these are practically identical.

In fact, they won't even recognize a separate breeds of dog until 1960 something. You know, and there's like slight difference in the shape of the ear. And it's like, okay, who cares? I'm sure that Norfolk Terrier is not going to be that upset if you call him Norwich Terriers.

You know, he's in Norfolk anyway. They'll lick your face either way. Right. So clearly that kind of error is relatively cheap, whereas classifying a human as a gorilla, as Google found out, is really expensive, like in the billions of dollars of, you know, trashing your goodwill of your corporation and its global reputation for being fair and idealistic and all the rest of it, right?

You know, it was, yeah, I'm sure it was sort of an innocent error coming from just using a uniform loss function. Sure. And, but if they thought about it, they were, oh, of course our loss function is not uniform. Oh, then what is it?

Oh, don't know. We haven't thought about it. And we're not sure. And in fact, you know, if you've got typically an image net, there's like 20 something thousand different categories of object.

Right. So your loss function is a matrix with 400 and odd million entries. And do you know what they are? No, no one knows what they are.

So you have an uncertain objective. You don't know what the objective is. You're supposed to be optimizing. And when you formulate the problem that way, right?

First of all, you'd have to say, okay, well, how do we specify a probability distribution over these 400 million entry matrices, these giant tables? And now I got to say, okay, what's the probability of each possible table of 400 million numbers? Well, that probability distribution itself massively complex object to specify. And no one has ever figured out even how to write it down, how to structure that probability distribution because it clearly has lots of structure, right?

The cost of misclassifying each breed of dog as a cat is probably about the same. I think all dogs are equally upset to be called cats. And if you classify a bus as an insect, maybe that's a more embarrassing mistake to make. And so on.

So you can imagine that there's lots of structure in this matrix. And the structure partly reflects the taxonomic hierarchy of objects and how we arrange them into categories. So you could do a whole PhD thesis just on that part of the problem. And now there's also, well, what does the algorithm look like?

Right? Well, if it doesn't know the loss function and it has the opportunity to find out more, for example, by asking the user, is it worse to call a cat a dog or to call an apple an orange? And sometimes the algorithm would say, I'm not going to classify that image. It's too dangerous.

So I'm just not going to make a guess as to what it is. So you immediately see that just the nature of supervised learning would change considerably if you allow for uncertainty about the underlying objectives. And then with AGI, we don't yet know exactly how to build AGI. I mean, there are a bunch of unsolved major conceptual problems that we have to figure out.

But I think the basic answer is that if you formulate AGI within this new model, the key property of the new model is that the better the AI solves the problem, the better the outcome is for human beings. Because it means that the AI system does a better job of finding out what it is you want and does a better job of achieving it. And so you were talking a moment ago about kind of applying control in the new model that you're proposing as we move into AGI. Would you pick that train of thought right up where you left it there?

Yes. So with AGI, if we formulate it in the new model, the key property is that the smarter the AI, the better the outcome for humans, because the AI system will be able to better interpret our behavior as evidence of our underlying preferences. This is the nature of information flow between the human and the machine about what the human's objectives and preferences are, is that everything the human does reveals evidence for our underlying preferences. So the AI system observing us, observing our whole history, observing everything we've ever written is able to infer from that something about what we want as individuals as a species and so on.

And so the better the AI system, the better job it'll do with that and the better it'll be able to achieve those objectives. When you say that, just to clarify, it sounds like you're going into unsupervised learning where it just has kind of the wealth of human knowledge and what humans have done. Are you strictly leaving it for in this new approach? Are you leaving it for the algorithm that you're training to figure that out?

Are you specifying it as the practitioner? Do you see this as kind of, at some point, maybe leaving kind of today's deep learning behind and taking a different approach mathematically? How does that look going into the future? If everyone adopts this?

Well, yeah. So you often see this claim that there's supervised learning, unsupervised learning. And then well, that, logically, if those were correct, then supervise and unsupervised with that constitutes a complete coverage of all learning, right? It's A and not A.

But then they say, oh, and there's reinforcement. And he's well, but actually there's not other kinds of learning too. And this is related to something we call inverse reinforcement learning. And inverse reinforcement learning is basically, well, first say what reinforcement learning is, right?

Reinforcement learning is the human specifies the reward to the machine. And then the machine learns how to optimally produce the reward, right? So the machine says, OK, this is, I'm going to give you one point when you win the game, I'm going to give you zero when you lose. And then the machine learns to get one point more often than not.

So inverse reinforcement learning is the other way around. The machine is observing, let's say, the human and trying to figure out what is the reward function that this human is optimizing, right? And we came up with it actually when I was collaborating with some biologists. And we were trying to figure out how could we apply reinforcement learning to understand animal locomotion?

So cockroaches and flies and creepy qualities and so on. And it struck me that actually we can't apply reinforcement learning to simulate animal, to create a simulated insect or whatever, because we don't know what the reward function is, right? So then I said, oh, well, why don't we watch them walking and figure out what reward function are they optimizing with their particular choice of how to locomote, right? Because I don't know if you've ever seen Monty Pyth and Silly Walk's sketch.

Yes, I have. But your listeners may want to check that out on the web, right? So John Cleese demonstrates that there are many other ways you could walk besides the usual one, right? So we choose the usual way of walking because it does something good for us, whether it's energy efficient or stable, you know, it avoids falling over.

Whatever it might be, it's optimizing something. And so the idea of inverse reinforcement learning is observe the behavior and figure out what is being optimized by this behavior. OK. And so this approach, the new model, is a sort of a generalization of that idea because it's generalizing the sense that the human is not just being passively observed by doing whatever human thing.

But the human is sort of an active participant. For example, if the human solves their half of this problem, they will actively teach the robot about their preferences, right? Including saying things like, I would like a cup of coffee, right? That's conveying preference information to the robot.

It's not an order. It's sort of just factual evidence about my state of mind and the robot can interpret it as it wishes. So when you solve this kind of problem, it's what economists call a game, which means a decision problem with more than one decision making entity. So you can imagine one human and one machine or lots of humans, lots of machines.

So you can solve that problem mathematically. And then you just look at the behaviors that the machine and the human engage in when they solve this problem. And indeed, the human teaches the machine. And the machine does things.

It asks permission. It allows itself to be switched off and so on. So you get very different behaviors than you do in the standard model of AI. And so I think I'm reasonably optimistic that in fact, it shouldn't matter how intelligent the AI system is, things will still go well.

And in the old model, the more intelligent the machine, the worse the outcome. For people, because the machine would find some way of messing with the world to achieve the objective that you said and mess with the things you forgot to mention that you care about. So I'm kind of curious as you take this and you're looking, we've hit so many different areas. And so I'm trying to tie it together.

You look into the future at this point, having come as far through this field as it's evolved and changed over the years, where do you see it going, especially with control in mind? And as you've talked about how the current standard model can lead us awry, then if you are a practitioner and you're out in industry and you're trying to do the things that your organization wants you to do. How do you apply your new model? And if you look out, what do you think we're going to be doing in terms of what types of models, what is AI kind of evolving into if you're looking out five years or 10 years?

And we're learning these lessons that you're teaching us in this capacity. What is the near mid and a little bit farther out? Look to you at this point. Interesting.

So first of all, there's going to be a little bit of pushing and shoving, right? I would imagine the AI community that's grown up with the standard model and learned it from a textbook, he's going to keep pushing ahead with their, you know, solving the technical problems that they're solving within the standard model. And they have to be dragged kicking and screaming into this new way of doing things. So, so partly it means we, you know, our my research group, there are, you know, maybe a dozen other research groups around the world working in this new framework now.

We have to provide the technical solutions. We have to provide the new algorithms that behave according to the different principles. And I think we can do that. If we could do that in practical settings, whether it's, you know, recommendation systems, content selection for social media, intelligent personal systems, then I think that will have a significant effect.

People will say, oh, now I get it. In fact, no, they won't say now I get it. They'll say, oh, I always thought that way, of course. So they won't, you know, there won't be a sort of a moment of capitulation.

There'll be just a gradual realization that, of course, this is what they've always thought. And it makes sense. It does make sense. Do you think that as they adopt this, as I'm thinking about what you're saying here, and you've already mentioned that the poorly named idea of AI ethics, you know, in terms of how do we prevent those?

How does that come together? I mean, because there's algorithmic side, there's the new algorithms and, you know, where you are going out into the future and you're implementing inverse reinforcement learning and it's working for you technically. And you're also trying to say we want to ensure that the outcomes are beneficial and certainly to the human involved. How does all that come together?

Because right now, you know, if I look at people, there are people that are doing kind of the outcome, the ethical concern there, there are people that are strictly algorithmically focused in terms of solving problems. And yet, if I'm understanding you correctly, you need to be able to fuse all these together, it sounds like so that that works. Yeah, I think the last thing you want and you probably experienced this yourself is that AI ethics people leaning over the shoulder of the AI practitioners and whiting their finger and say, no, you're a bad person. Right?

It doesn't work. So what we have to do actually is to get people to understand that this is just how you do AI. Right? You know, when civil engineers design bridges, there's not a bunch of bridge designers and then a bunch of ethicists say, oh, by the way, you have to make sure it doesn't fall over.

Right? It's just, of course it's not supposed to fall over. It wouldn't be a bridge if it just fell over. Right?

You know, and similarly, you know, nuclear engineers who design nuclear power stations, there isn't, you know, another discipline of people who care about safety and then the nuclear engineers don't care about safety. And they just want to generate lots of energy, right? It doesn't work that way. We want to generate energy.

We can just set up lots of bombs. That's cheap and cheerful. Gets past was red tape and all that crap. So I think there should be a strong incentive to just design systems this way because they won't fall down.

Right? The, you know, the example I use in a lot of talks is that, you know, your domestic robot, if it's designed this way, won't cook the cat for dinner because there is, you know, because the fridge is empty, because it would say, well, I'm in my real idea of cooking the cat, but dinner solves the problem of lack of food. But I'm uncertain about perhaps the cat has sentimental value. And so I shouldn't cook it.

I should ask permission before I cook it, right? So you get better behavior out of your AI systems. And so they'll be economically more valuable. They'll incur far less in the way of liability insurance and so on.

So there's that. But I also think that we at some point will need regulation because there will always be just as there is with malware, there's a tend to temptation to just bypass safety and all the rest, just in terms of immediate grasping. And so as a is is become more and more capable and potentially powerful, they need to be regulated more and more strictly just as we do with nuclear power stations. Sure.

And I would imagine that's not just at a national level, but they'll have to be a body of international law because you have different parts of the world, different countries have different values that they bring to play and some are going to care more about these kinds of outcomes and others. I guess I wanted to finish up with, you know, you have all these students looking to you and they're coming in and they are starting the careers out in this field. You have people like me who are a little bit older and we are we are trying to constantly retool and stay up with the field and you've really you've kind of shifted in this conversation that we've just had you've really shifted how I'm looking at the future and the things that need. How does a practitioner or student that's about to be a practitioner kind of tool themselves today beyond just the current state of deep learning because that's you know where all the focus is right now.

It's all about, you know, TensorFlow or, you know, pick whatever tool you want to use and we're we're building neural networks or adjacent technologies. And that's where all the education is really focusing, you know, that's broadly available out there on the internet and by service providers and others. How should someone like myself or a student coming in the field be thinking about this and how should we focus on educating ourselves for the future to align ourselves with this vision that you just said out. Obviously, there's your book.

There's human compatible. I don't know if it's out yet. The fourth edition to your textbook. Yeah.

So the fourth edition is out next week. We finished it a few weeks ago and I think it'll be in the stores in a week's time. Perfect timing as they hear this and go out and buy it. Yeah.

So it's an unfortunate situation because we basically we put the technical content from human compatible into the new edition of the textbook. You know, so the first two chapters saying, okay, well, there's this old way of thinking about AI now, this is new way, but we don't have all the chapters in the middle telling you how to do the new way. So we're going to tell you how to do the old way, but keep in mind that you should be thinking about the new way. So that ought to be the fifth edition.

And awkward timing. Yeah. So the fifth edition will hopefully have more stuff. But the things to keep in mind are, first of all, is the objective that you're designing your system to optimize.

And I think, you know, as I mentioned with the example of image classification and the classifying the person as a gorilla, most people are not even thinking about that. The objective there is typically implicit. You know, when you run TensorFlow, if you don't put in a loss function, then you're putting in a uniform loss function. And if you put in a uniform loss function, you're saying that classifying human and gorilla is just as bad as classifying one kind of terrier as another.

Right. And that's not true. So don't do it. Right.

The second thing to think about is what is the scope of action of your system? So if your system could learn any function that strings together actions that it carries out, you know, what is the sort of transitive closure? What's the full set of states that your system could take the world into when it runs in the real world? Right.

And if, for example, you're just writing a Go program, you know, and it's only moving pieces on a simulated board, you know, within the memory of the computer and then displaying. It's relatively safe. Right. Because no matter what sequence of moves it does, it's still only changing what appears on the screen when someone's playing Go with it.

Now, theoretically, it's not perfectly safe because, you know, just as we have learned the origins of our own universe and the physics of the world in which our bodies run, a sufficiently intelligent Go program could actually do the same thing and then figure out that there must be other entities outside of its computer and try to communicate with them through the Go board and convince them to give it more CPU power or whatever. Right. So it's not home, medically sealed even then. But if your algorithm is in contact, direct contact with humans, right, then, you know, here's one good way to remember this.

Hitler did it with words. Right. Hitler was not a thousand foot tall giant robot with laser beam sweeping destruction everywhere. It was just a little guy who spoke.

Yeah, that's a great point. And so if your AI system is in direct contact with humans, it has far more power than Hitler already because it can speak to billions of people all the time. That may be the most terrifying thing I've ever heard a person say right there. That's a perspective right there.

Yeah, if you think, you know, what is the closure, right? What is the trans closure of all possible actions that system could do? There's really no limit to it. Yes.

It could affect the world in any way. So those kinds of systems, I think, should absolutely be carefully regulated. And I think, for example, we talked earlier about how social media algorithms work. I think you can distinguish between reinforcement learning algorithms in that context and supervised learning algorithms.

So as supervised learning algorithm, roughly speaking, will learn what people want. Whereas in reinforcement learning, I will learn to manipulate people to change what they want so that the algorithm can make money off. And so I really think that algorithms that are facing the general public in that way need to be regulated, not in exactly the same way, but in some way analogous to the way we regulate pharmaceuticals. We don't just get to spit out new pharmaceuticals to billions of people whenever we feel like it, right?

They have to be carefully tested on animals and then on control groups of humans. And because if it goes wrong, it's really bad. And the same as we learned is true with the social media algorithms. Well, I just wanted to say thank you so much for coming on the show.

This is truly been one of the most fascinating conversations I've ever had. I think at this point, I will be recommending that people read human compatible pretty much everywhere I go. Well, thank you. You point out the danger of as we grow here, if we don't start taking that to account.

So thank you so much for coming on the show. You've really blown my mind. So I don't normally finish the stuttering. So thank you very much.

Thank you, Chris. Part Us on Spotify, Star Us on Overcast, and Telefren with their missing out on. Practically, I is hosted by Daniel Whiteneck and Chris Benson. It's produced by me, Jared Santo.

And our music is brought to you by the beat, Breakmaster Cylinder. We have awesome sponsors. Please support them. They support us.

Thanks again to Fastly, Lin-Ode, and Real Bar. That's all for now. We'll talk to you next time.

Share this episode

Similar Episodes

Milk Proteins without the Dairy - Adam Tarshis and Dr. Cory Tobin

Jun 9, 2026 ·50m

New Technology in Severe Burn Care - Dr. Katie Bush

Jun 1, 2026 ·31m

New Methods in Early Cancer Detection - Dr. Nate Montgomery

May 25, 2026 ·39m

Strategies in Combating Chronic Kidney Disease - Dr. Salvadore Viscomi

May 17, 2026 ·37m

AI and the Future of Healthcare -- Dr. Emilia Javorsky

May 8, 2026 ·39m

The First Environmental GE Organism Release - almost! Dr. Steven Lindow

Apr 28, 2026 ·25m

Similar Podcasts

PodQuesting Dwight J Randolph- WolfShield Media PodQuesting: -By WolfShield Media and Dwight J RandolphJoin us on an exciting journey to master the world of fiction podcasting! At PodQuesting, we document our quest to improve and innovate, sharing valuable insights, strategies, and behind-the-scenes tips along the way. Whether you're an experienced podcaster or just starting your first show, our podcast is your go-to resource for everything podcasting.Discover practical advice, creative techniques, and lessons from our own experiences as we explore the ever-evolving podcasting landscape. Ready to level up your skills and embark on this adventure with us? Tune in and join the quest!Have questions or feedback? Reach out to us at [email protected] and visit our website:WolfShield.Media The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. The 48 Laws of Power by Robert Greene (Full Audiobook) Robert Greene Amoral, cunning, ruthless, and instructive, this multi-million-copy New York Times bestseller is the definitive manual for anyone interested in gaining, observing, or defending against ultimate control – from the author of The Laws of Human Nature.In the book that People magazine proclaimed “beguiling” and “fascinating,” Robert Greene and Joost Elffers have distilled three thousand years of the history of power into 48 essential laws by drawing from the philosophies of Machiavelli, Sun Tzu, and Carl Von Clausewitz and also from the lives of figures ranging from Henry Kissinger to P.T. Barnum.Some laws teach the need for prudence (“Law 1: Never Outshine the Master”), others teach the value of confidence (“Law 28: Enter Action with Boldness”), and many recommend absolute self-preservation (“Law 15: Crush Your Enemy Totally”). Every law, though, has one thing in common: an interest in t Mind Force Radio.com Mind Force Radio.com Natural Strength Night is an informative, humorous, sometimes a little raucous, good-time of myth busting and honest training information from the trenches. We strive to help everyone involved with old school strength training (without steroids) to not make some common training mistakes. Along with great information, you'll hear a fair share of steroid bashing, flamingo sightings, breaking goons, iron game history, and honest drug-free training information from various leaders and strength coaches in the field to help you get real results! If your primary training information comes from reading "Muscle & Fiction" magazine we'll help get you straightened out. If you love high-intensity strength training, dinosaur style training and just like lifting heavy weights ... or loved Jack Lalanne, Sandow, Grimek, Peary Rader's Iron Man magazine, Brad Steiner's articles, Stuart McRobert's Hardgainer, Iron Nation, Osmo Kiiha's The Iron Master, you will love the show.On The Rugged Individual, we

Frequently Asked Questions

How long is this episode of Changelog Master Feed?

This episode is 52 minutes long.

When was this Changelog Master Feed episode published?

This episode was published on April 13, 2020.

What is this episode about?

AI legend Stuart Russell, the Berkeley professor who leads the Center for Human-Compatible AI, joins Chris to share his insights into the future of artificial intelligence. Stuart is the author of Human Compatible, and the upcoming 4th edition of...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Changelog Master Feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.