Exploring NVIDIA's Ampere & the A100 GPU episode artwork

EPISODE · May 26, 2020 · 53 MIN

Exploring NVIDIA's Ampere & the A100 GPU

from Changelog Master Feed · host Practical AI LLC

On the heels of NVIDIA’s latest announcements, Daniel and Chris explore how the new NVIDIA Ampere architecture evolves the high-performance computing (HPC) landscape for artificial intelligence. After investigating the new specifications of the NVIDIA A100 Tensor Core GPU, Chris and Daniel turn their attention to the data center with the NVIDIA DGX A100, and then finish their journey at “the edge” with the NVIDIA EGX A100 and the NVIDIA Jetson Xavier NX.Sponsors:Linode – Our cloud of choice and the home of Changelog.com. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019 OR changelog2020. To learn more and get started head to linode.com/changelog. Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Rollbar – We move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog. Featuring:Chris Benson – Website, GitHub, LinkedIn, XDaniel Whitenack – Website, GitHub, XShow Notes:NVIDIA Ampere Architecture In-DepthNVIDIA DGX A100NVIDIA EGX A100NVIDIA Jetson Xavier NXPractical AI – Episode #56 – Worlds are colliding - AI and HPCPractical AI – Episode #15 – Artificial intelligence at NVIDIALearning ResourcesNVIDIA Deep Learning InstituteDocker and Kubernetes: The Complete GuideUpcoming Events: Register for upcoming webinars here!

On the heels of NVIDIA’s latest announcements, Daniel and Chris explore how the new NVIDIA Ampere architecture evolves the high-performance computing (HPC) landscape for artificial intelligence. After investigating the new specifications of the NVIDIA A100 Tensor Core GPU, Chris and Daniel turn their attention to the data center with the NVIDIA DGX A100, and then finish their journey at “the edge” with the NVIDIA EGX A100 and the NVIDIA Jetson Xavier NX.Sponsors:Linode – Our cloud of choice and the home of Changelog.com. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019 OR changelog2020. To learn more and get started head to linode.com/changelog. Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Rollbar – We move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog. Featuring:Chris Benson – Website, GitHub, LinkedIn, XDaniel Whitenack – Website, GitHub, XShow Notes:NVIDIA Ampere Architecture In-DepthNVIDIA DGX A100NVIDIA EGX A100NVIDIA Jetson Xavier NXPractical AI – Episode #56 – Worlds are colliding - AI and HPCPractical AI – Episode #15 – Artificial intelligence at NVIDIALearning ResourcesNVIDIA Deep Learning InstituteDocker and Kubernetes: The Complete GuideUpcoming Events: Register for upcoming webinars here!

NOW PLAYING

Exploring NVIDIA's Ampere & the A100 GPU

0:00 53:20
of MATCHES

TRANSCRIPT · AUTO-GENERATED

So now we have this new NVIDIA DGX A100. Maybe you'll get one. I don't know if I'm going to get one. But we'll say, we'll say, yeah, yeah.

Sadly, I'm not in charge of procurement and I am certainly not in charge of procuring one for my personal use. They will say I'm guessing that my nonprofit is not going to get one. But if you happen to be getting your DGX A100 and you'd like me to run my training on it, I would be more than happy to do some benchmarking for you. They with the change log is provided by Fastly learn more at fast.com.

We move fast and fix things here at change law because of roll bar. Check them out at robot.com and we're hosted on Linno cloud servers and lino.com slash change log. Linno makes clock a bit simple, affordable, accessible, whether you're working on a personal project or managing your enterprise's infrastructure, Linno has a pricing support and scale. You need to take your ideas to the next level.

We trust little because they keep it fast and keep it simple. Check them out at linno.com slash change log. Welcome to Practical AI, a weekly podcast that makes artificial intelligence practical, productive and accessible to everyone. This is where conversations around AI, machine learning and data science happen.

Join the community and slack with us around various topics of the show at change.com slash community and follow us on Twitter if you're at Practical AI. Okay, take it away guys. Welcome to another fully connected episode where Daniel and I keep you fully connected with everything that's happening in the AI community. We'll take some time to discuss the latest AI news and we'll dig into learning resources to help you level up your machine learning games.

So welcome to the Practical AI podcast. My name is Chris Benson. I'm principally I strategist at Lockheed Martin. And with me as always is Daniel Whitenack, data scientist at SIL International.

How's it going Daniel? It's going great. It's a beautiful day outside. Hopefully I can take a walk after this and I've been staring at my screen most of the day since like seven a.m.

So I'm ready for a walk around the block or something. That sounds like a good idea. I am bleary eyed from screen time as well. So get outside and enjoy it, especially now that the at least where I'm at in Atlanta, the worst of the pollen seems to be passed.

So that's good. No more yellow cars from Pine Pollen everywhere. Yeah, that's rough. We need like an AI model that like takes in pictures of cars and people's driveways and like tells you whether it's safe to go outside yet because of the the pollen levels or so.

There you go. I'm sure there's easier ways to do that. But that's right. So we'll have to prep people next year.

So we'll just ask everybody in the audience next year to send us your images of your cars covered in Pine Pollen with a date a date time attached to it. And we will have a great project to go on this. Yeah, exactly. So and before we get going, I know that you've been doing some training classes.

How are those been going? They went. They went really good. It was interesting.

Because by the way, not taking. Yeah. So normally when I do like AI trainings and industry or like a conferences or something, obviously there's normally like a whiteboard there. There's a lot of kind of changes as the class goes on because a lot of easy kind of back and forth.

So it was interesting to figure out the virtual dynamic with everybody being at home. I think actually it ended up having some benefits because it sort of forced me, like normally I kind of write things at the whiteboard. I'm able to like make changes as I go. But to be able to, or to have to sit down and work things out in a strict set of slides that I'm showing made me really think about like, what is the proper flow to explain this idea and like show certain things.

So in that respect, actually it was, it was a learning experience for me a bit because it helped clarify some of that logic and even in my own mind. So, but it went really well in the students. Some had some great questions and went, you know, all the way from what is AI to convolutional layers and recurrent layers and training cool things. And so yeah, it was a good time.

You know, I've always found that no matter what the topic you teach, it forces you to assess everything that you think about because you got to explain it to other people and answer all that. And you all, every time I've done that, I've learned so much more about whatever it was I was going to teach. So for sure. Yeah.

People ask questions that just never have come into your mind, not because they're bad questions. It's just a different point of view. So it forces you to like backtrack yourself and kind of look at things from different directions. Yeah.

Well, that sounds interesting. I guess you probably looked out, you got through your training classes right before NVIDIA this year at their GPU technology conference, GTC made all their new hardware announcements and that's true with that. So and you know, it's interesting. I actually was spending the evening yesterday with my brother in laws who are living with us.

So we aren't like social gathering yet, but my brother in laws are living with us right now while they're back from college. And so they don't work in it. They're not like in computer science or anything like that, but they are pretty heavy gamers and they were both like we started talking about some of the stuff and they had even seen the keynote from the NVIDIA conference even before I started talking about, even though it was mostly AI related, it was like a, you know, already a sort of general meme that like the NVIDIA CEO is like presenting all of this cool GPU stuff from his kitchen, which was like, you know, you could see the spatulas in the background and some very interesting like fresco above his oven and such. So it was really interesting in that sense that there was like a, like people, even people that aren't in this space, like it made some impact on their life, which was really interesting.

You know, that's a great point. It's worth talking about. I mean, you know, we talk about NVIDIA and Google and other major players in this space often because you can't really talk about AI in a lot of cases without talking about the biggest influencers. And in NVIDIA's case, they were this gaming company and GPUs were originally to promote graphics and you know, computer gamers, you know, were constantly using it.

And now we're doing this in the AI space, you know, so any thoughts on how or insight into how that evolution came about, why we're using GPUs for all this stuff? Yeah, and actually also demand from the Bitcoin mining. That's true. So it's very interesting that like, if you look at NVIDIA's rise over time, of course, they existed for quite some time, but it's almost like, so they existed for a reason and they're really good at this reason.

And then all of a sudden, like the things that they were good at became like the most important things in the world. That's how it kind of how it seems. And then they just like, they were already there. So it's like, they just exploded.

So yeah, you're right. So if you think about like video gaming and that sort of thing or like things you would want to do in video processing or graphics, for example, like you might want to apply a filter to some image or frame of a video, right, like to darken it or to apply a gradient of color or, you know, something like this. And so you're essentially applying some operation to the pixels of an image, which are set up in a matrix and have like even some depth because there's a color dimension, right? So you have like this matrix of numbers and then you apply some operation on the elements of this matrix or really this volume, this input volume in AI, of course, with convolutional layers, like you are doing almost that exact same thing because you're applying like a series of weights and bias and, you know, functions, like activation functions to individual elements of a matrix or an input volume.

But even in like recurrent layers or like fully connected neural networks and that sort of thing, the types of networks that might be relevant to other things like text or just like general classification problem. And those take some input vector or matrix and just apply a series of weights to those apply functions, like activation functions, like tangent and sigmoid and all of these in an element wise way. And so you're really doing the sort of matrix operations that graphics cards were always good at. And so it turns out that it's really good to use those sorts of graphics cards for those sorts of operations, which are done in specifically in AI training.

Of course, we're going to talk maybe about inference today too, but I think it came about because these are the sorts of things that happen iteratively thousands and thousands and millions of times when you do training for an AI model. You know, that has to be the most accessible explanation to that evolution that I think I've heard anyone say. I think you did it better than a video actually says it. So that was well done.

Well, they can pay me if they like, I mean, or they can send me a graphics card. That would probably actually be better. I can't say. Hand-hand, you know, tighten RTX.

I won't even take the newest one. I don't even give me like a 100. From the newest one. I don't just small potatoes.

Yeah, that $5,000 or what? I don't even know how much it is. Well, tighten RTX. Well, now that we have been pleading for free stuff, let's move on to some of the things that they announced, which many organizations around the world are going to be trying to evaluate and figure out how they're going to incorporate, buy into, and basically utilize this new hardware and the supporting software capabilities that go with it.

Yeah, definitely. So, you know, I guess one of the things to talk about here is even before we get to announcements are the types of GPUs that are currently available and what forms, you know, what kind of off-brand GPUs are out there because NVIDIA isn't the only player in the space. Any insight into that? Yeah, I mean, it's probably worth distinguishing a few things here.

I guess first is like accelerators that are out there and types of GPUs that are out there and also like access patterns to those, whether that be like locally or in the cloud or whatever. At least from my perspective, and I'm by no means an expert on this on the graphics card front, actually probably my brother-in-law could do a better job. But there has been a progression and most of the time you'll see like graphics cards referred to by some series of numbers and acronyms. So like recent ones have been like something like 1080 RTX or Titan RTX or something.

So those are the graphics processing unit that you would like buy and then you have to plug it into like some computer, right? So some people say like, okay, I'm going to develop AI models and so I'm going to buy a computer, like a tower or desktop computer and then I'm going to buy a graphics GPU, like one of these RTX GPUs or something. And I'm going to put it in like my PCI slot and my motherboard. And then when I do AI training, then I'm going to like offload the training, some of those training operations to the graphics card or GPU that's input to my computer.

So that's sort of one of the first ways you might think about doing this is like I'm going to do AI development so I can buy a computer and then I'm going to buy a graphics card and just put it in there. And a lot of those of course come from NVIDIA. They make a lot of those cards, but there's off-brand ones that are similar to the models that NVIDIA has. And then there's also like other brands that have their own style of graphics card and that sort of thing.

Have you ever built or thought about building this sort of like workstation for your home to like sit by your desk or something? I think I'm way too lazy to do that. At this point, I'd much rather go to a cloud provider if I'm at home and use what they've built. I've noticed that most of the people that had workstation specifically for their AI workflows seem to have moved off those in recent years, either in the cloud or if they're big enough into more of a data center or at least a workstation level where they're buying workstation versus buying individual GPUs.

One of the things I was thinking is you were just talking about that was we had one of our early episodes, which was episode 15 called artificial intelligence at NVIDIA. We had NVIDIA's chief scientist, Bill Dally, on the show and he absolutely schooled us in the hardware. Do you remember that? We're much deeper than we'll go on this episode.

We definitely take a look at that. If you're wanting and we asked him to against other architectures and he went there and described it. If that aspect of it, not just the NVIDIA architectures but how they compare it to other things, I would encourage listeners to listen to that episode and he will absolutely school you in the fundamentals there. Yeah, for sure.

I just drive the graphics card or GPU, which is a lot of times what people think of when they think of GPU or accelerator in the AI world. They think of one of these GPUs or a series of them. There's other options too. So there's the TPU or Tensor Processing Unit from?

Is it Tensor Processing Unit or TensorFlow? I don't know if they put the brand in there. I think it's Tensor Processing Unit. I believe it is.

The TPU from the Google developed, which is another type of accelerator. But there's even other architectures out there other than CPU, GPU, TPU. There's FPGA and other things. Yeah, there's a lot of options out there.

Like you said, there's options also. So I kind of described like, okay, if you're developing AI, you could just like create, just buy one of these computers to have at your desk. But there's also just like how other forms of compute have been commoditized via the cloud. There's easy access to cloud resources for GPUs too, and all the clouds and even special built GPU like cloud services like paper space and others.

I know when I was looking around a while back for a project, I don't know if it's still the case, but I was trying to find like, what is the cheapest way to use a GPU in the cloud? And I ended up going with paper space. I don't know if it's the cheapest anymore. I do use like Google Colab as I mentioned a lot of times on the podcast.

And of course you can have access to a free GPU there. There's trade-offs because it's in a notebook and that sort of thing. But anyway, there's a lot of ways to access them, which aren't buying a computer and setting it on your desk. That's true.

That's definitely nice. So why don't we dive into some of the announcements that NVIDIA made at GTC as we're recording this. I think it was about roughly a week ago that they made the announcement and it'll be another week as it rolls out. But I'll start us off with they started and I'm going to probably butcher the pronunciation of the NVIDIA Ampere architecture.

Did I get that right? I've read it, but I haven't watched the video to see Ampere. There you go. How he's pronouncing it.

I think in reference to like AMP in electronics, I think there you go. I didn't get a connection there. Okay. So I don't know.

At least that's how I was saying it. I know that essentially this is what they've used to replace the existing architecture and expand it. They're really focusing on, I think, a more realistic in the sense of, I say cloud. But when I say cloud, I don't necessarily strictly mean cloud providers.

I mean, if you were putting together a data center with a whole bunch of GPUs or GPU servers in it, they're really focusing on not only the performance sides, but the usability as I was reading through it. Yeah. And I think that what I was gathering also in talking with some other people about this is so that generation before this latest one was focused more on the ray tracing elements, which is the RTX in a lot of these cards. Which, to be honest, I'm not a big expert on ray tracing.

Nor am I. That has implications, of course, in graphics and that sort of thing. But it wasn't like a huge advance in terms of like the size and capabilities of the graphics processing unit itself. It was more of this kind of generation of additional ray tracing capabilities.

Whereas this next architecture with the releasing, which they're calling the A100 or the Ampere architecture, which includes this A100 card or GPU that this is a fairly significant jump in the like size and capabilities of the graphics processing unit itself. I think part of that has to do with, I guess, the way that they've laid out the transistors and all of that on the substrate that it's much more dense in my understanding. Yeah. Am I recalling it was something like 20 times performance improvement over the V100?

Well, yeah. So it's 20 times greater flops, which is like a measure of actually you probably are better versed in the acronyms, but this is like a common way to like measure the performance of like computers, like super computers and that sort of thing. So 20x greater flops for AI, although they do give some benchmarks, which is pretty nice just for reference. And what I was looking at, they give some benchmarks for training BERT, large scale language models, which we have an episode on BERT as well.

If you'd like to learn more about that. Yes, we do. We've mentioned it several actually. Yeah, it's good that we ended up having that conversation.

But the BERT models are these very large language related models, NLP models that have just tons of parameters. And actually these large language models have even billions of parameters. Now I forget how many BERT has, but they get some benchmarks both for the training and inference on speedups on training BERT. So on BERT itself, they're saying that above the V100, so the V100, if you go to like Google Cloud or if you go to paper space or one of these platforms, at least right now, I think the best GPU that you can get access to is called a V100, which is a previous generation.

And it's pretty wicked fast. I mean, I've used this in a couple of projects and it's quite astoundingly faster than the sort of entry level GPU. Yes. And it's the basis for the DGX line of service as well.

Yeah, I think so. I was prior to this release. Yeah. They're saying that there's a speed up between three and six times in the training for the large scale training.

And the difference between the three to six X has to do with the precision of the floating point numbers that you're using in the model. So I'm stepping way beyond my bounds into like computer science land where I don't deserve to step. But in the models, obviously you have all these weights and parameters and the matrices that you're transforming in these models and computers work with numbers and those numbers have to be represented in some form. In some precision, you can't, like if you have, if you're representing Pi, you're not going to represent like all digits of Pi in Pi.

You're going to have to cut it off somewhere. Right? Yeah. This is having to do with that precision of the numbers.

And you can actually, if you reduce the precision of how you represent numbers, you can sometimes speed up your performance. And so that's what they're talking about there with that difference to three to X, three X to six X. Yeah. And I think I'm looking at their inference and I know they're saying it's a seven times speed up one inference.

So it's substantial in that case. So they say like this card, this A100 accelerator, they bring up this idea of what do they call it the, I don't know if they're, if you say it, Meg, I'm thinking of like the fighter jet. But the multi instance GPU, which is really a treating idea. Do you work for us now?

And that way? Yeah. They're saying it's multi instance GPU, which in what I'm saying, are they saying like you can basically treat the GPU as seven GPUs. Is that what they're saying?

So I was wondering that myself. And so a big topic that I spend my time at work is around multi-tenancy and your workflows and you know, the accessibility of compute in those. And I was taking it in that way, but I'm not sure because they're a little bit ambiguous in the way they use some of the terms. And the other one that I noticed is they, they talked about the need for no code changes.

And I'm assuming that's CUDA code changes in this case, but they weren't always as specific as they might have been in terms of their explanations here. I was wondering about that as well. Of course, there's certainly our ways to make changes like this transparent, but there's a change somewhere, right? It's just maybe at the abstraction level you're working with and tensor flow or something, you don't have to make a change in TensorFlow, but in the underlying libraries, somewhere it seems like there's some type of change.

Yeah. It talks about the multi-instance GPU as I'm looking through that. It's talking about seven different isolated GPU instances running different applications simultaneously. Yeah.

Like when they say 7X speed up for Burt Large inference and they have under their in parentheses, 7Mig, a 7 multi-instance GPU, they're using them all. What I'm assuming that is meaning is they basically are running seven inferences in parallel in the seven GPU, which seems to be the same performance that they're indicating for as the V100. So for inference-wise, it seems like the change is that you're able to run things in this parallel way, whereas on a V100 or something, maybe you couldn't do that. And so there wasn't that speed up.

I'm making some assumptions here. That's true. I know for a fact that there are folks in video that listen to the podcast. So hopefully if we're getting this wrong, they can- Yeah, clear us up.

They can clarify for us and we'll come back at a later time on a later episode and say, we were wrong. We're happy to do that. So we're making the best of it. We were wrong.

And if you send us a GPU, then we'll prove that we were wrong on our own local system. You're back to begging. Oh. It seems pretty cool.

I like the idea that if you've gone from a stage of training to inference, basically, whereas before, maybe you had this full powerful GPU that you were basically running inference on, but not soaking up all of the goodness of the GPU and the compute. Here, they're basically saying, okay, well, you can parallelize the inference over that and still utilize this whole compute capability. But now you're just having this ability to split it up in nice ways. So I definitely think that's pretty cool.

And- It's interesting with the parallelization of this. It was an image that I saw NVIDIA put out where they were comparing the old architecture with the new 8100 architecture. And they basically had one little server for the new that was equivalent. They were showing rows of racks of servers in terms of its productivity.

But it was definitely an impact. It was something that me and some folks I work with were passing around. So yeah, got to keep up with times, I guess, if you're going to keep driving forward on compute. Anything else on the architecture at large before we talk about DGXs or dive into the processors themselves?

I think the one thing that you mentioned about the speed up without code change, I think they do introduce this new idea where as people before talked about floating point 16 and 32 numbers, where again, these are having to do with the sort of precision with which you're representing numbers, they introduce this new idea of tensor float 32. I saw that. Which apparently with float 32, obviously if you have more digits, you can represent more numbers. There's this kind of range, but it's not as fast as floating point 16 in some cases.

So what they're saying is they're trying to balance the two, I think, in that they have a wider range of numbers they can represent in this representation, but with lower precision such that they can speed up training. So again, hopefully I've represented that well in terms of how they're thinking about it. There's an image of this on a blog post that will link in our show notes if you want to kind of understand how the floating point 16, 32 and tensor float 32 compare. But this is definitely a new representation on this chip that I don't think has happened on any other architecture yet.

So that might be worth pointing out. Yeah, totally. Another thing that we probably should mention from the architecture is that they've gone to the new third generation for in the link and in the switch. And that manages the network scaling of how you're moving your data around through the chips and stuff.

And I think that it's something like a 10 times bandwidth, if I recall, in terms of what it can do compared to, or it may have been 10 times more than PCI generation four. I think that was what it was that I was recalling reading. I'm going to get the number wrong, but they said there was like so many terabytes per sum and singly small time. So it's like a bunch of data you could transfer back and forth very, very quickly.

Yeah, these links. Absolutely. So the envy link is that has to do with communication of data between GPUs. Is that the idea?

That's what I've always assumed. I don't have the opportunity too often to run on like run my training on like 32 GPUs. So this is where I'm getting to the edge of my understanding. But I did watch a YouTube video and I think that's what they implied.

Is that like staying at a holiday express? Yeah, I've stayed at a holiday and express. Exactly. Yeah.

So I've watched the YouTube video. My understanding was like, because people also build these Bitcoin mining rigs, right? And they have all these GPUs on top and they're running all the time. The way they do that is they basically connect a bunch of them to PCI slots on a motherboard.

To do that, they have these little adapters called risers that come out of the motherboard. But apparently those are very slow in terms of communication between the GPUs. Yeah. And PCI is slow in that way.

And so at least that's what they're implying that in VLink and some of these other things from NVIDIA help facilitate that communication of data. And like you're saying, it helps scale out to like, now if you have 32 GPUs in your data center and you're trying to run some computation across them, you're going to need to have very quick communication for like scientific applications or AI applications that are not just Bitcoin mining, which is just running operations. There's actually communication that's needed. Yeah.

If I recall correctly and it's been a while since I've delved into those back when they originally released the architecture, I believe that in VLink connects GPU to GPU. It gives you that interconnect between the two. And then that, you know, essentially that mesh is something that in V Switch then connects at a higher level, combining the different in V links to I see. So in the links, GPU to GPU and in V Switch, we'll call that now.

But if listeners, if you know we're wrong, let us know and we'll put a note in the show notes or something. That's good to make that connection. Okay. So like this is their new way of replacing what they did have in the DGXs, which the DGXs were the sort of boxes that they put in data centers, GPU data centers to like scale up like an AI supercomputer of some type.

Correct. Or a cluster of them, which is becoming more and more common. In the earlier days, you know, people would get like when the original DGX1 came out and had a GPU's in it and people would get that and that in itself, people were calling it supercomputer. And you know, we talked like that such a long time ago.

It's only been a couple of years. But then they moved to DGX2 and then that was 16 and then they've actually scaled it back. And just a moment, let's talk about that. The change log is deep discussions in and around the world of software and has been going for over a decade.

We interview hackers like Chris Anderson from 3D Robotics. At the time drones were like predators and global hawks and military industrial, they were classified and super, you know, $10 billion things. And we just built drone with Lego pieces around the dining room table programmed by nine year old and it's like, okay, that should not be possible. You know, it's not when when eight nine year old can do something that's classified that literally export control is an emission with Lego between pieces.

It was something important in this world is changed. Leaders like Devin Zugel from GitHub in the like 10 to 15 year range or 20 year range. What I would really like is for you have like three 12 year olds hanging out and one of them is like, I want to be a firefighter. Another one is like, I want to be a lawyer.

I want one of them to say I want to be an open source developer and innovators like I'm out who's saying I've yet to kind of see applications that scale that don't use multiple languages that don't have just arcane stories behind why this weirdo thing exists. You know, like, all right, when you open this file, you're going to have to turn around three times and tap your nose once. Like it's just the most hilarious stories, you know, but applications are living, breathing. They have craft that's normal.

So I want to normalize weirdness because that's just how applications evolve over time. Welcome to the change log. Please listen to an episode from our catalog that interests you and subscribe today. We love to have you with us.

So now we have this new Nvidia DGX A100, which they've kind of broken the paradigm of their labeling. So they went from DGX one originally to DGX two. And now they've gone to DGX A100. Maybe you'll get one.

I don't know if I'm going to get one. But we'll say, yeah, sadly, I'm not in charge of procurement and I am certainly not in charge of procuring one for my personal use. We'll say I'm guessing that my nonprofit's not going to get one. But if you happen to be getting your DGX A100 and you'd like me to run my training on it, I would be more than happy to do some benchmarking for you.

Gotcha. I'll talk to my boss's boss's boss's boss. Exactly. See what's possible for you.

I'm there for you, my friend. I'm really into the mooching today. Yeah, totally. I got it.

I'm good. We're there to support you. I'm there for you. But yeah, I mean, with this new architecture, it's much more performant, but they've actually cut the number of GPUs in the server back down to eight from 16.

But it has the enhancements that we just talked about at the processor level, architecturally. So it's interesting that they kind of cut that down, but they have this multi-instance GPU capability. So actually, they say you can run 56 applications, you have a few applications per GPU times. Yeah.

And like you were saying, even though there's fewer here because the size increase of A100, they kind of showed this picture in the keynote, which people can watch. But supposedly you can kind of reduce the size and footprint of your data center because you're doing more computation per box per DGX than you were before. Correct. And this is interesting.

They were saying like, you know, each box, let's say, I think the price they said was like a million dollars, right? So this is not what I'm going to be putting on my desk, but certainly within the range of compute budgets for some companies. So like each one was that expensive, but you could do the same that you could if you spent previously, like $11 million on your data center. So like scaling wise, you can do more with less, I think is the idea.

Yeah. When I was originally looking at these announcements as they came out, I think one of the call outs here and this architecture does start to address that. But I think people in organizations that can't afford to get DGX systems. And they do choose to invest in those.

They underestimate what it takes to get productive with them. And so they kind of just think, I can go buy a DGX and just everything's going to work out after that. And then all my training will complete in three days and I'm done. Exactly.

Nothing to do it. But I think the challenges when you're scaling up to one or more DGX systems, then you are talking about an overall, I'm not just talking about a DGX architecture. You're talking about an overall systems and software architecture in your organization and specifically data architecture that can support moving a lot of data around through training in an organized way that flows in with your business processes. And that is a big challenge.

And I think being able to make all that work in your own organization is where a lot of organizations are struggling. And I know NVIDIA works hard to throw them a bone. They work hard. They recognize that and there are a lot of tools that they put out there to try to help you through that process.

Yeah. I think this architecture has kind of accounted for some of those pain points of the past and they're trying to make it easier to utilize, you know, in number of GPUs across multiple DGXs, which is good because there are cases, there are very highly scaled cases where you might be doing a lot of experimentation with like hyper parameter optimization. And you want to try just an insane number of different possibilities when you're doing your training and have the ability, not just to train one time, but to train many, many, many times and thousands or millions even. And I think they've understood that and that this architecture is starting to address that highly skilled use case.

Yeah. I think that gets to the point of sort of, you know, maybe something that is on people's mind as they listen to this is like, why not just the cloud and like use GPUs in the cloud, which you can certainly do. So like you could, you know, if you wanted to run a thousand experiments to test all your hyper parameters, you could spin up a thousand GPU nodes and Google Cloud or Amazon, or where else. But if you're doing that at any sort of frequency or link, that's going to, the bill is going to add up pretty crazy fast on that.

So if this is something that like a company actually wants to do and, you know, AI is central to their strategy, to their products, and they want to get that very best model and they want to do that experimentation over and over and over again. If this sort of DGX system is capable of supporting, you know, the usability side of things like you're talking about, then they could run those over and over again as much as they are able to use ability wise. And so I think that that kind of gets to the point for some people, like, you know, I keep joking that I'd love to have access to this, but I probably wouldn't, you know, just me myself, since I'm the AI person doing a lot of the AI things on my team and I don't have a team of 40 different people trying to run things all the time, then, you know, I'm pretty okay with using like a GPU instance in the cloud when I need it because I might run a training for 48 hours or even four days or something. But I do that not very often.

And it's just me. But if you got like a team of 40 people or you got multiple teams throughout your organization and they all need to run that stuff, that adds up really, really quick. It does. I have been pleased just in general when you combine the advances in EnvyLink and Switch, when you combine that with the multi-instance GPUs that these A100s are at this point, the scalability technology, which without diving into it is called Melanox Connect X-6, if I'm saying that right.

It's a nice blend of architectural considerations to get you there. And, you know, we haven't even talked yet about advancements on the edge. Yeah. And that is a huge, huge area at this point.

I'm glad you bring that up because it's probably, even though, you know, I may not get access to the sort of DGX system, I am thinking about various applications at the edge. In fact, I had a conversation earlier today with another guy who's working on totally different stuff in manufacturing, but they're not a large company, but they do stuff at the edge at the manufacturing setting with low power devices already, like think like a Raspberry Pi and that sort of thing. But if you could bring the power of like this sort of GPU to like right to the edge to a machine where you're doing computer vision to detect like anomalies in your manufacturing process or something like that, that's a pretty major advantage. And that brings that sort of capability to those sorts of people that are working on smaller teams and have that specific use case for running AI at the edge.

They have the Nvidia. So along with the A100, they have the EGX A100, which they're releasing, which seems to benefit from some of these things that we talked about with the A100, but they also talk a lot about security, security and an end to end encryption of AI models and all the things. And I have some ideas about why, you know, that may be important at the edge, but you have any thoughts on that? Well, we live in a time where, you know, we've had so many episodes where we talk about malicious actors and they could be anywhere from, you know, state level all the way down to teenagers that are savvy and having some fun.

And we're in a world now where you just can't really assume that you can put anything that's not secure out the edge. And that doesn't have to be in the defense world, you know, where I live. That can be really anywhere, any industry at this point. So they have had obviously their previous kind of edge oriented offerings and we like to, you know, they, you know, there's the smaller skilled stuff that we'd like to play with.

You know, they have the NX out now. They've had me nose out the last couple of years and so with that. But as industry really gets serious about pushing inference out to the edge and having it both widespread and pervasive, having kind of a comprehensive and sophisticated security model that they can deploy onto these platforms is pretty key. And I think that's really at this point, it's no longer a specialty thing.

It's now something we're all having to acknowledge. Yeah. Because like if you think about products, some products that have come out over the past years, like if you think about a drone that's kind of come out, I think there's multiple drones now that have come out that have some sort of AI model running on them that, you know, it does something like it follows you around or like whatever the thing is does some some operation object detection or something. If you're thinking about releasing a product that has like this sort of edge GPU running inside of it, whether that be in a manufacturing sense or like, like the drone or robot sense or something, really the AI model that you're releasing with that is part of your IP, right?

And you spent hundreds of thousands of dollars into it. So you got like the malicious actor side of thing, but you've also got the fact that like, oh, if I buy a cool thing to strap on my manufacturing machine that has one of these GPUs in it and it's like doing something sophisticated, well, if they're giving me the model in this product that I'm buying, why don't I just like unscrew the hatch? And like plug my computer in and just take the model off of it. And now I don't have to pay them for that product anymore.

Right. So we've gotten to a point where the actual AI model is is a piece of IP and is extremely valuable. So you wouldn't want like your client or your, you know, competitor, especially to just be able to buy one of your products, unscrew the thing and like, you know, copy, you know, CP, my model. PB from from the machine over to their machine.

And then they've rid of themselves of their need for buying your product. Right. So, yeah, I was just going to say it's funny. I've noticed this a lot lately and that when we talk about the fact that you're now seeing models being deployed to the edge, you know, just in massively parallel, deeply pervasive in whatever your business is, you know, as you know, I have a daughter who's young.

We just went through a birthday and the toys that you can buy these days are now incorporating this stuff in. It is actually by toys that have convolutional neural networks in them. You can buy them that have an LP capability. And I think that's the moment where I find myself surprised because we're so used to talking about it in these kind of business oriented context.

Yeah. But then, you know, that's also someone else's business is to make these toys and yeah. And I, you know, I keep being surprised at these toys that she unwraps and they have these capabilities of all people. I should not be surprised.

I suppose. But I am just to see it in that context. Yeah. Well, and especially in that case, like at least depending on the age of the child, you know, it would be important for that AI model to run offline on the device.

And like, let's just keep that thing offline. And it's good if it acts as a toy, but let's not connect it to the wild west of the internet just yet. So yeah, I definitely see, you know, you'd want to run that sort of model at the edge itself and upload it to the device. I think the other thing I wanted to mention is, um, so I was going to mention the nano.

So if people are thinking about like Raspberry Pi and this gets down maybe where it does bring some accessibility to a lot of people. So there's like Raspberry Pi devices, which are like single board computer devices, which have been, of course, wildly popular. But Nvidia released a Jetson Nano, which is like a single board computer with a little, a little GPU on it. And I was actually thinking about getting one of those, but I don't know if it was in this series of releases or just very recently, they released this Xavier in X, which is like a next greater version of this.

They actually call it like, yeah, a little AI super computer. And it's it is a single board computer. It's got like something like a 10 X computer or something like that of the nano. And so when I was going to get that one, I just ended up getting the other because it seemed pretty awesome.

And I think that a couple of things struck me about this one is I'm always trying to think of, um, you know, like, for example, the cases that we work with and the people that work in our organization around the world, of course, we're going to disconnected setting a lot of times because they're out in the field. And of course, a lot of people around the world don't have internet, but then also, you know, we're not flowing with money. So like what is a way to like get things running at the edge in a reasonable way in a disconnected offline way, but also at a cost effective way? And I find it really interesting that some of these things are coming out that have sort of a GPU capability and the Xavier in X, it's interesting that it's it's got the GPU and you can run inference on it, but you can actually update your models as well.

So they talk about doing transfer learning, which is like an update of a model. So you're redoing some of the training, maybe you're training some of the layers or you're training additional layers that you're out onto your model. So I'm really curious when this comes in. I actually it should be coming in today.

So I'm kind of watching out my window right now. Yeah, I'm going to reveal before we started the episode when Daniel and I were talking, he's waiting. He's watching out the window for UPS to show up. Station right in my front window.

We might possibly get a package opening here on air as you know, with that. Singer's cross. Yeah. So I'm curious to kind of try what I want to try actually is just to like start with a small model and see like how the training compares to like in some like with a better GPU and paper space or something.

And then like try it all the way up to like how far can I push the training on the on the index? Can I actually train like how big of a model? Can I train from scratch on it? And then like how big of a model can I do transfer learning on?

Because yeah, I find that incredibly interesting. The other thing that they talk about with the indexes cloud native things at the edge. And I know both you and I are really big fans of Docker. So I find it interesting or as before I didn't see them emphasize a lot of things about using Docker at the edge to run like AI related workflows.

And now they're saying, well, this is how you should do it in this device. I find that really interesting. Yeah, not only Docker, but Kubernetes as well. It's, you know, I mean, and we've talked, I know we've talked about this on other episodes when we're hitting slightly different topics, but we really this whole kind of AI revolution.

And this happened over the especially if you're looking at the last three to five years, we really, really benefited from what had the revolution that had just swept through the software development world and software systems deployment world out there and that Docker and Kubernetes became the systems to build on. And we landed on top of that and just took that over. So it's really good to see all of the hardware, whether you're talking about, you know, the lower end GPUs, such as, you know, the nano that you talked about all the way up to, you know, the latest here, this, you know, DGX a 100 all using that same architecture. And so if you learn at one place, you can use it from the most scaled down to the most scaled up version and you can use it in the data center and you can use it in the edge.

And that is a wonderful, wonderful thing that we've inherited. Yeah, I totally agree. I've really enjoyed talking about all of these things. I've got a lot to learn on all of these fronts.

And I, if you're thinking like, oh, all of this GPU stuff and like accelerated AI is very new. Don't be afraid. I, there's still like, I, I didn't come from a sort of computer science background, but there are, there is tooling that's accessible for you to, you know, get into some of these topics. And one learning resource, a lot of times in these fully connected episodes, we like to mention learning resources.

So actually, NVIDIA themselves have what they called the, I think it's the deep learning Institute and they have a series of courses that talk about everything from like getting started with AI on the Jets and Nano. That's a little single board guy that we were talking about all the way to, you know, more advanced topics with high performance computing, high performance computing with containers. They talk about various GPU accelerated frameworks, like rapids and AI in the data center and all sorts of topics. So if you're interested in this sort of accelerated AI topic, you know, you might check that out.

We'll definitely linked it in the, in the show notes as well. I know I have a lot to learn there myself. So I'm going to go slightly off topic, but I, and it just occurred to me as we were talking about this for the learning resource. I'm going to throw out there.

It's going to be one that a friend of mine mentioned just earlier today. That he is utilized for those of you who may be familiar with the learning site Udemy, U-D-E-M-Y dot com. There is a course on there called Docker and Kubernetes, the complete guide. It's not expensive, especially if there's a lot of coupons, you can get it at a very low price, like 10, $12, $13.

And so this person hit on through that course was like halfway two thirds the way through and just thought it was fantastic to ramp up on it. So given that recommendation, I'm going to recommend that to everybody and we will put a link in the show notes because if you're going to be in the AI world, it really pays to understand Docker and Kubernetes. Well, awesome. Yeah.

Well, check those things out. Reach out to us on our Slack channel or on LinkedIn or Twitter with any questions or thoughts that you have. And hopefully this has been a fun episode. It has for me.

It has been. We will see you next week. See you Chris. See you later and I apologize to the NVIDIA people are going, Oh my gosh, those guys.

They need to know more about it. We accept feedback and the show notes and everything is on GitHub so you can submit a PR. So yeah, feedback, welcome. See you next time, Daniel.

Bye.

PodQuesting Dwight J Randolph- WolfShield Media PodQuesting: -By WolfShield Media and Dwight J RandolphJoin us on an exciting journey to master the world of fiction podcasting! At PodQuesting, we document our quest to improve and innovate, sharing valuable insights, strategies, and behind-the-scenes tips along the way. Whether you're an experienced podcaster or just starting your first show, our podcast is your go-to resource for everything podcasting.Discover practical advice, creative techniques, and lessons from our own experiences as we explore the ever-evolving podcasting landscape. Ready to level up your skills and embark on this adventure with us? Tune in and join the quest!Have questions or feedback? Reach out to us at [email protected] and visit our website:WolfShield.Media The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. The 48 Laws of Power by Robert Greene (Full Audiobook) Robert Greene Amoral, cunning, ruthless, and instructive, this multi-million-copy New York Times bestseller is the definitive manual for anyone interested in gaining, observing, or defending against ultimate control – from the author of The Laws of Human Nature.In the book that People magazine proclaimed “beguiling” and “fascinating,” Robert Greene and Joost Elffers have distilled three thousand years of the history of power into 48 essential laws by drawing from the philosophies of Machiavelli, Sun Tzu, and Carl Von Clausewitz and also from the lives of figures ranging from Henry Kissinger to P.T. Barnum.Some laws teach the need for prudence (“Law 1: Never Outshine the Master”), others teach the value of confidence (“Law 28: Enter Action with Boldness”), and many recommend absolute self-preservation (“Law 15: Crush Your Enemy Totally”). Every law, though, has one thing in common: an interest in t Mind Force Radio.com Mind Force Radio.com Natural Strength Night is an informative, humorous, sometimes a little raucous, good-time of myth busting and honest training information from the trenches. We strive to help everyone involved with old school strength training (without steroids) to not make some common training mistakes. Along with great information, you'll hear a fair share of steroid bashing, flamingo sightings, breaking goons, iron game history, and honest drug-free training information from various leaders and strength coaches in the field to help you get real results! If your primary training information comes from reading "Muscle & Fiction" magazine we'll help get you straightened out. If you love high-intensity strength training, dinosaur style training and just like lifting heavy weights ... or loved Jack Lalanne, Sandow, Grimek, Peary Rader's Iron Man magazine, Brad Steiner's articles, Stuart McRobert's Hardgainer, Iron Nation, Osmo Kiiha's The Iron Master, you will love the show.On The Rugged Individual, we

Frequently Asked Questions

How long is this episode of Changelog Master Feed?

This episode is 53 minutes long.

When was this Changelog Master Feed episode published?

This episode was published on May 26, 2020.

What is this episode about?

On the heels of NVIDIA’s latest announcements, Daniel and Chris explore how the new NVIDIA Ampere architecture evolves the high-performance computing (HPC) landscape for artificial intelligence. After investigating the new specifications of the...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Changelog Master Feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!