Reinforcement learning for chip design

What this episode covers

Daniel and Chris have a fascinating discussion with Anna Goldie and Azalia Mirhoseini from Google Brain about the use of reinforcement learning for chip floor planning - or placement - in which many new designs are generated, and then evaluated, to find an optimal component layout. Anna and Azalia also describe the use of graph convolutional neural networks in their approach.Sponsors:Linode – Our cloud of choice and the home of Changelog.com. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019 OR changelog2020. To learn more and get started head to linode.com/changelog. AI Classroom – An immersive, 3 day virtual training in AI with Practical AI co-host Daniel Whitenack. Get 10% off using the code PRACTICALAI10. To learn more and purchase tickets go to datadan.io. Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Rollbar – We move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog. Featuring:Anna Goldie – GitHub, LinkedIn, XAzalia Mirhoseini – LinkedIn, XChris Benson – Website, GitHub, LinkedIn, XDaniel Whitenack – Website, GitHub, XShow Notes:Their research paperGoogle BrainGoogle is using AI to design chips that will accelerate AI | MIT Technology ReviewPractical AI episode #47: GANs, RL, and transfer learning oh my!Upcoming Events: Register for upcoming webinars here!

of MATCHES

TRANSCRIPT · AUTO-GENERATED

Being with your ChangeLog is provided by Fastly, learn more at fastly.com. We move fast and fix things here at ChangeLog because of rollbar, check them out at rollbar.com, and we're hosted on Linode Cloud servers, head to linode.com slash ChangeLog. Do not underestimate the power of the independent Open Cloud for developers, yes? I'm talking about Linode.

Linode is our cloud of choice, and it's the home of ChangeLog.com. What we love most about Linode is their independence and their commitment to Open Cloud. Open Cloud means being unencumbered by outside investment and maximizing value for the community, not shareholders. And that's exactly what Linne represents.

No vendor lock-in, open at every layer. If you want to learn more, head to linode.com slash open, again linode.com slash open. Welcome to Practical AI, a weekly podcast that makes artificial intelligence practical, productive, and accessible to everyone. This is where conversations around AI, machine learning, and data science happen.

Join the community and Slack with us around various topics at the show at ChangeLog.com slash community and follow us on Twitter if you're at Practical AI Evan. Take it away, guys. Welcome to another episode of Practical AI. This is Daniel Whiteknack.

I'm a data scientist with SIL International. And I'm joined as always by my co-host Chris, who is the principal AI strategist at Lockheed Martin. How are you doing, Chris? I am doing very well.

How's it going today, Daniel? It's going very well, staying very busy. As we were talking before the episode, it seems like after the crisis and after being everyone at home right now, so for future listeners, this is still during the COVID-19 crisis, it seems like I'm more busy. She's shelter in place.

More busy work-wise now than even before. What about you? I think it's the same. And we're just trying to not be paranoid.

We're right in the middle of pollen season. And everyone in my family suffers. So you get sore throats and you're coughing. And you're like, oh my god, that's not symptomatic of COVID or something.

So we're just trying to maintain or calm. Get through this and all is well. Other than that, it is a lovely day here in Atlanta, Georgia. Yeah, it's a beautiful time to get outside as well, of course, maintaining distance from others.

But we've got something I think pretty interesting to chat about today. In the midst of all of the COVID-19 and coronavirus tweets and articles that I've been seeing and reading and all of that stuff, I was able to pick out this one story that seemed really interesting to me that was not related to COVID-19. And that was the story about a team at Google who was using reinforcement learning to somehow design chips like hardware computing chips. And we were joined today by Anna Goldie and Azalea.

I'm sorry to mess up the name Saini. You can correct me here in a second on that one. Sorry, Azalea. Welcome to the show.

Thank you so much for having us. Yeah, thanks for having us. Yeah, excited to have you both. And excited to chat about this amazing project is really interesting when I read it.

But before we jump into the project itself, I would love to hear a little bit about both of your backgrounds and how you ended up doing what you're doing now. So maybe we could start with Azalea. Could you give us a little bit of information about your background? Yes, my PhD from Rice University in computer engineering.

My work was, or my thesis was focused on code design of hardware software systems for machine learning applications. And then when I joined Google, I joined Google Brain to the residency program. I stayed at Google Brain as a resident for a year. And that was the time that the developed passion for the work at the intersection of ML for system, like how do we develop new machine learning algorithm and deep learning algorithms for system.

And ever since then, I've been at Google Brain for almost four years now. And enjoyed doing research and work on impactful projects. Awesome. Now you're a senior research scientist at Google Brain?

Is that correct? That's correct, yes. Awesome. Well, thank you for the background.

Anna, do you want to give us a little bit of information about your background as well? Yeah, sure. I studied computer science and linguistics, actually at MIT. And I did my master's basically building a Mandarin-speaking dialogue system.

I joined Google about eight years ago, Google Research. And I've been working mostly before this on natural language processing applications. And about two and a half, three years ago, I started working with Azalea. I actually saw some parallels to natural language processing in some of these systems problems.

Because a lot of them could be formulated as sequential decision-making problems. And it's been just wonderful working with Azalea. And we have such an awesome team. So I've been basically trying to use machine learning to optimize and automate various problems in computer systems.

So could you give us a little bit more of an idea about this team and how big it is? I know you both work closely together. But what's the sort of team like that you're working on? Actually, when we first started, it was basically just me and Azalea.

And then we've been gradually growing. I think we have something like maybe eight, 10 people on the research side. And then we also partnered with chip designers who are building the next generation of TPUs. Maybe there's about eight, 10 people on that side as well.

Because it's pretty substantial effort at this point. So kind of wondering, I come from a software-only background. And as we dive in, I'll probably be more comfortable in the reinforcement learning department. I do nothing about creating chips and stuff.

And I guess if you could just kind of lay out the context of what that means, what does it mean? I've heard the phrase chip placement and I've heard chip floor planning and stuff like that. Could you talk a little bit about the baseline, about what it is you're trying to do, and how that is, the context of it from the hardware side, which I know nothing about. Sure.

Maybe I can take a stab at that. So basically, this is just one of the stages of chip design. There's already been basically computer architecture stage. And so this occurs first.

But the problem that we were solving in our research was taking a graph of chip components, which is called a net list. So it's basically a bunch of S-RANs, or memory components macros, and standard cells, which are logic gates like NANs and NORs. All of these are connected by wires and so is a graph. And we want to place that graph onto this two-dimensional grid, such that we minimize various costs, like latency of computation, power consumption, wire length area, while adhering to hard constraints on density and congestion.

So that's the core problem that we're trying to solve. So do I have it right? Like when you say that this is kind of graph structured, you're meaning like there's this component, like something physical that has to go on the chip, and then there's this other components, and they need to be linked by an electrical connection, I guess. Is that like a way of saying it?

So the graph is formed of these components and the electrical connections between them. Exactly. Yeah. There's all these sort of logic and memory components connected by wires or like electrical connections.

And then we physically need to decide where to place them, so that we get better performance for that chip. Gotcha. And can you talk a little bit about what that means? When you talk about why does physical placement have an impact on performance and what it is about that placement, which affects performance?

So one way you could think about it is the timing of computation or the amount of time it takes to compute with this circuit is affected by the lengths of critical paths in this graph, this place graph. So if the total wire length connecting these components is larger, then it's going to tend to be slower. It's going to consume more power. That sort of thing.

Gotcha. And how big of a graph, like how many things are we needing to place and optimize in general? So it's millions, like millions of standard cells. And then in a chip, there's typically many, many blocks.

So hundreds of millions total of components that you're replacing. Gotcha. So Azalea, I'd love to get some context in terms of, like, how has this kind of been this problem of figuring out the placement of all of these components of the graph? How has this been approached in the past?

And what are the bottlenecks or problems in terms of creating a solution to this? There are several approaches to this problem in the past. In fact, since 1960s, research in both academic community and industry started on doing the physical design or placement optimization. There were various approaches, for example, there are quantitative approaches.

There are approaches based on greedy methods or simulated ennealing or hill climbing approaches, generating algorithms and such. I would say that the way we came in and the way we, basically, deep learning and reinforcement learning is helping us taking a new step at this problem is that for the first time, we can learn the context of the problem and learn from experience, meaning we think, unlike all of the previous approaches, what we are doing is training agents that can accumulate experience. And as they're optimizing more chips, they become better at placing new chips. This is an approach that's different from all the previous existing methods.

Gotcha. And for those who may not be very familiar with reinforcement learning as a technique, before we dive into how you're using it in this, could you take a moment and give listeners, either one of you or both of you, what is reinforcement learning and why is that, in particular, a technique which lends itself? But even if it's starting with just the quick run through the fundamentals of what is it if you're not familiar with it? So basically, it's a way in normal machine learning or supervised learning, you're trying to fit labels to input examples.

In this case, you have this additional power, I guess, you can take actions in the world, and then you receive feedback from your environment. And then you use that information to try to optimize the parameters of your own policy, which is generating these decisions to do better over time. So basically, it's composed of states, which is the state of the world at a given moment in time. So for us, replacing these chips, one of the nodes at this graph, one at a time, onto the chip.

So the state is what is the placement so far? And then actions are decisions that you make at each point in time, which is for us, like where to place an X node. And then reward is the final key component for reinforcement learning. It's the feedback that we get from our environment.

In our case, after we place all the nodes, we have approximate signals on wire lanes congestion in now timing. And we use a weighted average of these to tell our policy how well it did. And so it can update itself and generate better placements over time. So I know a lot of people might have kind of heard about reinforcement learning, maybe with agents that play Atari games, or maybe more so in robotics.

In those types of scenarios, you have this agent, which maybe composed of one or more models. And it's trying to take actions like people like tend to maybe associate that with taking actions in the video game, or moving the arm of your robot, or something like that. In this case, the quote unquote game you're playing is really the placement of these components. So your agent is placing components and then getting feedback about how well it's placing those components.

Is that a good way to put it? Yeah, that's great. Yes, exactly. OK, great.

So in doing this, I'm kind of curious, I don't know if anyone's tried to do this before. I assume maybe not in terms of reinforcement learning for this problem. How did you come to decide that reinforcement learning might be a good approach in this scenario versus maybe some other methodologies? How did you come to the point where you say, oh, those things that people are doing in robotics, or in these games, or something else, how did you come to think that those methods, specifically reinforcement learning, might be suitable here?

So before we started this project, we have been working on another project, which was doing device placement optimization between reinforcement learning. So that project had to do with taking a computational graph, such as a machine learning, a cancer flow graph, and mapping it optimally to the hardware devices, such as GPUs, such that the runtime or performance of the underlying ML algorithm becomes as fast as possible. So that problem was a counter optimization problem, and a very complex task. And started thinking about how ML and this context ever learning can help doing that optimization problem better than existing ones.

And we thought, and reinforcement learning is really like a natural thing to come in mind. If we think about ML because this task is not a supervised task, we don't have labels for it. We want to optimize this problem by doing several realms of exploration and solicitations. So we did an important learning for that, and we got a lot of interesting, very encouraging results on the device placement task.

So when we came to a natural next step for us to try, OK, now what did we try the same kind of approaches for the chip placement problem, which is a much more complex problem than device placement? So that was the transition for us from devices to chips. But the interesting thing was that chip placement, when we came to it, we realized it's a way like orders of magnitude more complex problems than device placement. So it was very unclear to us in the beginning that we're going to get gains with reinforcement learning for this problem that has been, there's so much research on it already.

But after some trial and error and several realms of improving our algorithms, it seems like it actually is helping a lot in this problem as well. What's up? This is Daniel Whiteknack, one of your practical AI co-hosts. And I hope you're enjoying this episode and staying healthy during these crazy times.

I'm working on some pretty cool AI stuff here from my home office. But I've also found that I am having to get a bit creative and be intentional when it comes to honing my AI skills and virtually connecting with the AI community. If you're in a similar situation, or you've been inspired by the practical AI we talk about on the show, I want to invite you to a live online AI training event I'm hosting this May called AI Classroom. In AI Classroom, I'm going to teach you the practical skills I've learned over the years using the latest open source AI technology.

You'll learn AI theory along with practical hands-on implementations in both PyTorch and TensorFlow. And after the training, you'll be able to understand the latest AI models, implement your own models in code, train computer vision and NLP models, create model inference servers and experiment with state of the art methods like reinforcement learning. AI Classroom is taking place this May. It'll be taking place live and completely online in a high quality virtual classroom.

So no travel is required. There'll also be two cohorts with convenient time zones for Eastern and Western hemispheres. Don't miss out. Tickets and more information are available at datadan.io.

That's datadan.io. And practical AI listeners can use the code practicalAI10 for 10% off. See you online in AI Classroom. So I'm curious.

You mentioned a moment ago that the data itself wasn't labeled, lack of labels, and that reinforcement learning was seemed like a very good technique to lend. I am curious if you had not gone down this route, or maybe not machine learning at all. What some of the other options, whether they be in the realm of machine learning, or not, might have been just to have a sense of what kind of the technique opportunity costs would have been. How might others have done it had you not gone this path?

So we did experiment with some other techniques, state evolutionary strategies. They tend to be less sample efficient. So it didn't really seem like too promising a pass to go farther down. We also experimented with using supervised learning as a way to basically ground our architecture search.

The policy that architecture that we were able to achieve generalization with was to using a supervised learning objective. And then we used that as sort of the encoding stage of our full policy value net and achieve better generalization results. Yeah, I would love to follow up on a couple of those things. So maybe digging into a couple of those pieces just to break it down for listeners.

So when you're talking about this encoding piece, and the supervised stage that you did complete, does that have to do with getting the graph structure data into another form, like a sort of embedding or representation that you would use in other things? Could you kind of describe that a little bit more? Yeah, sure. So I think basically, in order to achieve generalization, it really, really is about the representation.

Like as you said, what is the correct embedding for a given input graph? So basically, we created this very large data set of different placements generated by different placement techniques, including reinforcement learning policies, but also like force-directed methods, simulated annealing, greeting methods. And we used that to try different architectures on the task of predicting the approximate wire lengths and congestion for those placements. And the architectures that were better at this prediction task did a much better job of creating policies that were able to generalize across different ship netless.

Because a presumably we had a better representation. I am curious. You mentioned a little while ago that the thing that inspired you guys to kind of go down this particular path was device placement optimization. I would imagine, and correct me if I'm wrong, I would imagine that this is like a completely different scale in the sense of working in very, very small spaces.

I would imagine compared to the original device placement optimization you were doing. If that's accurate, did the scale, moving down to such small spaces make a difference? Or was it fundamentally the same, did the approach hold up the same as you had experienced in the prior project? Azalea, do you have any thoughts on that one?

Yes. So in both projects, we are still doing reinforcement learning. So the mental approach still remains the same. But like you said, the scale of these two problems are very different.

For example, in device placement, we have a dozen. Our action space is tens of devices, or less, or a few devices, few GPUs. But here, our action space is the placement or cells of the canvas onto which we are placing the chip. And these canvas can have thousands or even more of locations.

So our action space is orders of magnitude larger than the previous problem. At the same time, our input state, which is the graph that we are processing, a chip graph, like Anna mentioned, can have millions of nodes. Whereas a computational graph could have tens of thousands. So here in this problem, we were dealing with a much more complex state and action space.

And to enable RL agents that can optimize this problem, we had to do several changes to the way that we present the environment to the agent. For example, we had to take a hierarchical approach to the way we represent the input graph. So for example, we grouped certain standards cells. We break down the complexity of the input state to a graph with thousands of nodes that we were placed on placing.

And on the representation learning, we had to do a lot more work because in this problem, not only we were interested in placing one chip, we were also interested in creating agents that become better at placing on-scene chips because that opens new opportunities for chip design optimization. If we can quickly, given a chip block, can place it, optimize it, and see how well we are doing in it. So this generalization property that we wanted from this problem led us to really heavily focus on representation learning of the graph. And we created a lot of new techniques for creating this generalized representation that we are hoping in future problems, better in other stacks of chip design or other kind of hard ML for combinator optimization that we're dealing with can help us do better in those problems as well.

So I'm really curious, I have a follow up from that. And as you were talking, I was thinking about how, essentially you have all of these different possible arrangements of the graph onto the physical canvas, like you said. But also in this problem, as you're placing components, there is this sequential nature to it. And maybe this is where, I think it was mentioned earlier, that there were kind of even some parallels with natural language processing.

And I was wondering how you deal with this situation where you're really not just taking a one, at least in my understanding, you're not taking a one step approach of like, here's all my components, and then here's my prediction for the placement of all those components. You're kind of placing one component, and then placing another, and then placing another, kind of in a more iterative sort of way. So is that the, how do you deal with that sequential nature of this process? And does it involve kind of like sub graphs within the graph?

And then adding a component to that, and kind of taking the last so many components, and then trying to figure out how the next component comes in, kind of like placing characters or placing words when you're doing text generation? How does that sort of sequential thing come in? So the first architecture that we had that worked well, we would actually pass images of the placement so far. And so the model was kind of like a human designer, as they're maybe placing a graph, they could see, you know, what space is left on the canvas and such.

And we had a basically an LSTM model for the policy head. So basically that sort of stores information about the full sequence of placement decisions that have been made up to that point. But in the end, actually, I think our current policy head is a deconvolutional neural net that predicts a policy decision over the two dimensional grid. And I'm kind of curious, and I'm also following up with the same thing actually.

And you may be starting to address that there, but I was kind of curious, you mentioned when you were talking about representation learning of the graph that there were some new techniques that you got into, and you made the comment, and then I was wondering, is there anything else that you kind of learned to apply to this, or did you just cover it right there? So I think as all you was getting at this graph embeddings that were developed for this project, and I think at a high level of the insight there was that for most sort of graph and visual neural net type applications, it's really about the features of the nodes themselves. And so you kind of represent nodes as some kind of average or other aggregation of their neighbors' features. But in our case, what really matters is the connections between these nodes, because it's about the past.

And so our graph embeddings are much more focused on edge features. And kind of diving a little bit more into those embeddings. Again, I'm trying to make connections with maybe things that I've seen or heard about before. I know in the NLP world with these newer language models that are coming out, and the word embeddings that they're generating, the thought is, oh, we're going to train this model or learn this representation based on one or more tasks, like replacing missing words or something like that.

And then you learn this embedding, and then kind of apply maybe some new layers onto the network to do a particular task, like question answering or whatever it is. Here is it similar in that you were talking about how you use some supervised learning to train the embeddings in my understanding. So you have these certain tasks that are supervised, and you learn the graph embeddings, and then you were able to apply those in a new scenario. Is that the strategy or do I have that wrong?

Yeah, that's very much right. Yes, that's correct. So I think the way we can describe this is that we train architectures to capture the representation or encoding embedding of the input by having a supervised model that bit very easy to produce labels. We call them pseudo labels, right?

Those labels were our proxy costs for the optimization, that are very fast and not at all expensive to generate. So the motivation for us to train architecture this way was if our agent, our policy is to generalize to on scene graphs. It should also have a good understanding of predicting what the actual reward is for a given state. Like unless it can, so that's like a prerequisite for generalizing policies, don't see graphs, to have an idea how good a current state is.

And that's what made us do the supervised approach first, where we predict these pseudo labels for a given graph. And once the architecture is tuned in a way that this prediction task is done at a high accuracy for the test set, then we take that and use that as the encoder part of the policy for further optimization of placement. Awesome. This is really interesting because we brought up graph neural networks a couple times on the show, but maybe not in this sort of applied way that we're talking about them here.

I was wondering if you could just before we get too much further, just mention like what makes like a graph neural network a graph neural network instead of just a normal neural network, I guess. And maybe like help clarify for people because even in this episode in our conversation, we mentioned like computational graph, which people might come to people's mind if they're thinking about tensorflow, there's like this computational graph in the background. But here, like for a graph neural network, we're not talking necessarily about the computational graph. What makes the graph neural network a graph neural network?

Is it just the input data in this sort of way of representing graph data? So what makes graph neural nets a graph neural nets is what they encode information. So that in a typical graph neural nets, we are learning representations of the nodes of a graph with respect to the properties of this node and the properties of its neighbor nodes and the neighbor of the neighbors and so on. So graph neural nets have this property that they can encode information about the one hop to hop, like K hop adjacency information of a node.

And you can also, on top of this adjacency information, like the connectivity graph, you can also add features per node. And you can also in our case, you can add features per edge of the graph. So basically, graph neurons are allowing us to capture all of this information about the graph structure of an input data and generate embeddings of the nodes and edges that kind of relate and can capture those graphs structure and graph information. So, you know, having gone through this, which is fascinating, it's entirely new to me, I'm curious what the results were like and is, you know, kind of where did you arrive?

What surprised you along the way in the process, you know, what was not what you were expecting to see? And also, how did the larger organization at Google take the results? Is it something that is now becoming kind of standard at Google or was it just a test or an experiment or how did it affect the larger organization in terms of designing chips going forward? So we have definitely tested this method on chips that Google makes and have gotten super human results on a good portion of the complex chips that we tried to place them.

But in terms of other questions, you ask and not sure if you can answer that at this point. Okay, so nothing jumped out from a surprise standpoint, just kind of like you got something? I was curious. Yeah, sure, I have something to offer on that.

I don't know, it's surprising maybe just exciting. In terms of those generalization results, we would say take a policy and pre-train it on a larger number of chip net lists and then, you know, apply to a new chip. So, it's sort of surprising excited us was that a pre-trained policy that was fine-tuned for say only 12 hours without perform a policy that was trained from scratch on this net list for 24 hours or more. So I think it was exciting to us that this new policy architecture generalized so well that it actually does better and it takes less time.

That's pretty amazing. Yeah, was that having to do? I know when I was looking at the paper, you talked about like domain adaptation, which I remember we talked about with the OpenAI team and also we've talked about in relation to robotics and moving hands. So is that key to that sort of generalizability is adapting the domain or the environment during this training?

If so, did you have to like create a bunch of simulated data for various environment changes and that sort of thing? What was your approach there? So we actually just used real chip net lists for all of the pre-training. But so we'd say trained on 20 real chip net lists and then we were able to achieve those results where we have much better and faster results, but we probably could do some kind of data augmentation where we could maybe turn those 20 into many more or source more net lists in some other way and we would do much better.

And what is your feeling in terms of, you know, how specific this pre-trained policy is for the sorts of chips that are included in the set of chip nets, I think you called them, that you used during training. How specific do you think the pre-trained version of the policy is for that kind of family of chips or do you think it's generalizable beyond that? I think it's definitely affected, the policy is performance on a new net list is definitely affected by the types of net lists that it's trained on in the past. At the same time, it's a pretty general problem.

So yeah, I think as long as you trained on a representative set of net lists, you could do well on anyone. Gotcha. And what are some of the challenges that maybe you faced during this project that you maybe didn't have time to address in the initial version of this project? What are some of the things that you want to explore more going forward?

I mean, there's just so many other stages of this process and kind of what's exciting about developing policies that can more quickly generate high quality placements is that we can kind of explore feedback or interactions between say previous stages, upstream choices, like those toys of S-ram. Basically, there's a certain amount of memory that needs to be in this chip, but the choice of how to slice it up into these macros is somewhat arbitrary. And if you can say, try one, like slice out the macro as a particular way and then see what kind of placement, what level of quality you can get in terms of timing and other properties from that quickly, you could do all sorts of explorations upstream. So I want to follow up on something that you were saying before, and just to make sure I understand, when you're looking at these different types of chips that you want to apply this to, and going back, I know we had someone talk about some chips from a previous company earlier, but they were talking about basically different types, you know, from GPU, TPU, FPGAs and such as that.

Do those different architectures dramatically change the problem for you? I know that we were talking about the domain adaptation a moment ago, but I mean, in a practical sense, do you have a substantially different RL approach every time you change out the chips, given that like, I believe a GPU will have a whole bunch of things beyond what a TPU might have on it, you know, because it's being able to address problems whereas a TPU is very specific to the matrix multiplication. How does that affect your approach on that? Personally, wasn't clear enough on it because of trying to learn this as we go.

Yes, so we have tried our method on a bunch of different types of chips that were available inside of Google and also chips that were available open source. And the way we did our RL approach didn't need to change going from one set of chips to the other, but definitely like Anna was mentioning, the larger, if you have a chip that is drastically different from anything you've seen before, then it could affect the performance of the agent. But at the same time, the input space of our problem is very abstract. We don't deal with the specifics of a chip.

Rather, we are dealing with a generic, like a netless representation of a chip with these nodes that with certain connectivity and the node have different sizes and different shapes. And we are placing them, optimizing for the cost that we have developed. So the problem, if you don't think about what chips they are, it's very abstract in a sense that it can really handle different sorts of input from different chips. And so far, we didn't have a chip that was drastically different from our training set that we had to change the RL algorithm for.

There is always a modification of the algorithm for improving it overall. But like I said, the input state is pretty standard among different types of chips. So I'm kind of curious the more I think about this problem. It seems kind of like we're using an AI method to help design a chip on which AI will hopefully operate or be trained or run inference on or that sort of thing.

I'm kind of curious on a more general sense how you see AI as we move into the future. And AI development continues to accelerate. Are we going to kind of need these sorts of methods more and more because more specialized chips are going to be needed for these types of AI problems moving forward? How do you see AI influencing the hardware that AI runs on, I guess, is my question.

So a chip design is a really complicated task. And making customized chips is definitely also very complicated. We are witnessing the more that we are going to need more and more of these customized chips because of various computational demands of especially AI algorithms. And our vision is AI can help the design of these chips because of its ability to learn and improve over time and its ability to kind of optimize over a very large optimization space.

For example, if you look into the chip design process, there are various stages of optimization from architecture design to logic design to verification and physical design and placement. Each of these stages are very complicated, are commutally hard. And so our goal, our vision is AI can help us finding globally optimized solutions across all of these stages. Then we are going to have hopefully a lot more performance improvement over what we have right now where we optimize each stage separately and then just cascade them together.

And the reason we think AI can help with it, as we mentioned this a couple of times in this conversation, was that AI can improve over time. And this property is something that's very different from what we have seen in any other existing methods. So the policy, the agents can become better, more experienced, at doing newer tasks. So if we actually want to do this experience over time, then we are having, we are dealing with these agents that become much better than any single person or single algorithm that has our optimized chip.

Yeah, and that way it almost seems like so many other areas that we're applying AI techniques to and that you take it at that superhuman level and just continue on. It makes me wonder, as you guys, and I don't know, this may be almost an organizational question to some degree, but I'm curious whether having pioneered this, being able to apply reinforcement learning to this particular problem, is this something that the two of you are expecting to continue working on for some time? Or have you kind of done your experiment and you got your results and you're going to move on to other problems? If it's the latter, what might those other problems be?

Or if it's staying on this, what are you looking to next? What's the next step, whether it be on this problem or doing something else for each of you? And you want to go first? I think that there are definitely other stages of the chip design process that have a lot of impact.

I think getting to your last question a little bit in terms of how can this affect AI for AI and chips? The current chip design process takes nearly two years. And so there could be certain types of machine learning architectures that just aren't computationally feasible on today's hardware. But if we could more quickly design chips for them, they might become more viable approaches.

But the problem is that say chip floor planning is just one of these stages. So if you wanted to say dramatically accelerate this process, you would have to tackle these other stages, say architectural exploration or design verification. Awesome. And you've built one of the building blocks of that process, but you could be exploring some of those other building blocks as well.

Is that right? Oh, that's right. Awesome. What about you, Azalea?

Yeah, I think I'm in a similar boat. I think I'm very excited about the research on RL and ML for optimization tasks in general. And I think chip design is a very critical and important application of optimization, something that's going to enable. Like if you have better chips, we're going to have better next generation AI algorithms as well, because chips are key enablers of those algorithms.

So I would say both research on RL for optimization and with applications in chip designs, that's something I'm very excited about and look forward to continue working on. Awesome. Well, thank you both for taking time to join us. This has been super fascinating.

And it's been great to dive into some of these subjects, like graph neural networks and chip design and these things that we haven't talked a lot about on the show. So I really appreciate both of you taking time and joining us for the conversation. It was great to talk. Thanks for having us.

Thank you so much for having us. Thank you for listening to this episode of Practical AI. More like this at changelog.com slash Practical AI. There you'll find our latest as well as the list of our most popular episodes and the ones we recommend.

If this show has helped you on your AI journey, please leave us a five star review on Apple Podcasts, Part Us on Spotify, Star Us on Overcast, and Telefriend with their missing out on. Practical AI is hosted by Daniel Whiteneck and Chris Benson, is produced by me, Jared Santo. And our music is brought to you by the beat, breakmaster cylinder. We have awesome sponsors.

Please support them. They support us. Thanks again to Fastly, Linnoed, and Real Bar. If you and your organization could benefit from speaking directly to all the AI practitioners out there, you should sponsor the show.

Hotcast advertising is one of the most effective ways to spread your message in an authentic way. Plus, you get the added bonus of supporting something you love. That's all for now. We'll talk to you next time.

We'll talk to you next time.

Share this episode

Similar Episodes

Milk Proteins without the Dairy - Adam Tarshis and Dr. Cory Tobin

Jun 9, 2026 ·50m

New Technology in Severe Burn Care - Dr. Katie Bush

Jun 1, 2026 ·31m

New Methods in Early Cancer Detection - Dr. Nate Montgomery

May 25, 2026 ·39m

Strategies in Combating Chronic Kidney Disease - Dr. Salvadore Viscomi

May 17, 2026 ·37m

AI and the Future of Healthcare -- Dr. Emilia Javorsky

May 8, 2026 ·39m

The First Environmental GE Organism Release - almost! Dr. Steven Lindow

Apr 28, 2026 ·25m

Similar Podcasts

PodQuesting Dwight J Randolph- WolfShield Media PodQuesting: -By WolfShield Media and Dwight J RandolphJoin us on an exciting journey to master the world of fiction podcasting! At PodQuesting, we document our quest to improve and innovate, sharing valuable insights, strategies, and behind-the-scenes tips along the way. Whether you're an experienced podcaster or just starting your first show, our podcast is your go-to resource for everything podcasting.Discover practical advice, creative techniques, and lessons from our own experiences as we explore the ever-evolving podcasting landscape. Ready to level up your skills and embark on this adventure with us? Tune in and join the quest!Have questions or feedback? Reach out to us at [email protected] and visit our website:WolfShield.Media The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. The 48 Laws of Power by Robert Greene (Full Audiobook) Robert Greene Amoral, cunning, ruthless, and instructive, this multi-million-copy New York Times bestseller is the definitive manual for anyone interested in gaining, observing, or defending against ultimate control – from the author of The Laws of Human Nature.In the book that People magazine proclaimed “beguiling” and “fascinating,” Robert Greene and Joost Elffers have distilled three thousand years of the history of power into 48 essential laws by drawing from the philosophies of Machiavelli, Sun Tzu, and Carl Von Clausewitz and also from the lives of figures ranging from Henry Kissinger to P.T. Barnum.Some laws teach the need for prudence (“Law 1: Never Outshine the Master”), others teach the value of confidence (“Law 28: Enter Action with Boldness”), and many recommend absolute self-preservation (“Law 15: Crush Your Enemy Totally”). Every law, though, has one thing in common: an interest in t Mind Force Radio.com Mind Force Radio.com Natural Strength Night is an informative, humorous, sometimes a little raucous, good-time of myth busting and honest training information from the trenches. We strive to help everyone involved with old school strength training (without steroids) to not make some common training mistakes. Along with great information, you'll hear a fair share of steroid bashing, flamingo sightings, breaking goons, iron game history, and honest drug-free training information from various leaders and strength coaches in the field to help you get real results! If your primary training information comes from reading "Muscle & Fiction" magazine we'll help get you straightened out. If you love high-intensity strength training, dinosaur style training and just like lifting heavy weights ... or loved Jack Lalanne, Sandow, Grimek, Peary Rader's Iron Man magazine, Brad Steiner's articles, Stuart McRobert's Hardgainer, Iron Nation, Osmo Kiiha's The Iron Master, you will love the show.On The Rugged Individual, we

Frequently Asked Questions

How long is this episode of Changelog Master Feed?

This episode is 44 minutes long.

When was this Changelog Master Feed episode published?

This episode was published on April 27, 2020.

What is this episode about?

Daniel and Chris have a fascinating discussion with Anna Goldie and Azalia Mirhoseini from Google Brain about the use of reinforcement learning for chip floor planning - or placement - in which many new designs are generated, and then evaluated, to...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Changelog Master Feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.