Welcome to Practical AI, the podcast that makes artificial intelligence practical, productive, and accessible to all. If you like this show, you will love The Changelog. It's news on Mondays, deep technical interviews on Wednesdays, and on Fridays, an awesome talk show for your weekend enjoyment. Find us by searching for The Changelog wherever you get your podcasts.
Thanks to our partners at Fly.io. Launch your AI apps in 5 minutes or less. Learn how at Fly.io. Well, welcome to the very first fully connected episode of Practical AI in 2025.
In these fully connected episodes of the Practical AI podcast, Chris and I keep you fully connected with everything that's happening in the AI world and hopefully share some learning resources to help you level up your machine learning and AI game. I'm Daniel Whitenack. I'm a CEO at Prediction Guard, and I'm joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris?
Doing good today, Daniel. It's a lot of interesting things happening out there in the AI world. And I love these conversations where we do these fully connected kind of deep dives into things that are a personal interest to you and me, which is how we choose them. And there are a lot of exciting things coming up in these episodes.
It's a little bit easier for us to kind of freeform talk about a few things. But for those listeners who have been seeing our logo for some time on the podcast feeds, just FYI, there'll be a change to that coming up, but no need to swap out our feed or anything. That should be good. We're still doing great things with the changelog, and they've made a few changes to their shows and their lineup, publishing them in different ways.
They have a show about that if you want to learn about it, but we'll still be going strong and excited kind of for probably a much needed refresh. I don't know. I think it's been like six and a half years or something, hasn't it, Chris? It has been, but we can change in six and a half years.
I don't know. Yeah. I mean, six and a half years in GPU time is a long time. Yeah, that's expensive.
Yeah. So, yeah, just FYI, longtime listeners, to be aware, you might, you might scroll through your podcast app, look for a look for a new logo sometime in the near future. But yeah, I think, obviously, that's bigger news than DeepSeek, but I guess we can devote most of the episode to what is the story of our week, couple weeks, and who knows how long, which has been DeepSeek R1. I know this is, you know, I was thinking as we were saying that speaking of GPU time, in this case, maybe a lot less GPU time.
Yeah, a lot less on maybe an unclear amount. That's true. Good point. They only talked about the final run that was successful in terms of the spend on it.
So, yeah. Well, I guess we're getting ahead of ourselves for those who are not as familiar with it. Yeah, so for those, probably many of those that are listening to this particular episode have come across DeepSeek. But for those that have not seen anything, maybe you've been under a rock somewhere.
Chris, what are we talking about with DeepSeek? So we have, there's a Chinese startup that we're talking about here that has released a large generative model, LLM, that is, I guess, and I'm going to gloss over some stuff right here just because we'll dive into the specifics, but is very highly performative. It's comparable to the best models that OpenAI has had out there. But the thing that's really rocked everybody's world is the fact that it was trained at much, much less cost, at least the parts we know, which we'll dive into that detail as we said.
There's some things we know and there's some things we don't know, but it appears to have been achieved at a much, much lower cost than all of the competing models from anywhere in the world up to this point. And so in short, the AI world, and I guess everybody outside the AI world that cares about this stuff is in this giant debate and conversation about the implications. Is it a big deal? Not such a big deal?
Why is it a big deal? You know, is it overblown? And of course, Daniel and I are about to dive into all of that right now. It's a target rich environment, as we like to say in defense.
Are they surveilling us while we use the model? Exactly. Could you, should you, might you run the model in all sorts of different ways? Yeah.
There's tons of confusion around this, Chris, which is one interesting thing that hopefully, hopefully after you listen to this podcast, you're not more confused. We don't make that guarantee, but hopefully, hopefully that's, that's the case. Yeah, it's interesting. So I think one of the stories around this, and there's multiple kind of narratives that we can go into here, so much to talk about.
One of the narratives is around how some Chinese startup with a much lower budget or spend on the model building built such a good model and essentially gets parity to models from OpenAI and others. In particular, the comparison has been made to the No. 1 model, which if you remember, we talked about this on the show. This is OpenAI's sort of thinking quote-unquote model.
So the model, when it generates output, so you put in a prompt, the LM generates text output. A beginning portion of that text is sort of thinking content, meaning they're training the model to sort of spit out logic of how to solve maybe a deeper problem or reason about the input prompt before it actually gives its final answer. If you're in the ChatGPT interface, you can see this kind of in a different color or grayed out. If you're using the API, I don't think that they send that back in the API.
You still pay for it, but I don't think they send it back. This is a similar type of reasoning model, so DeepSeek R1. And in this reasoning, this kind of very kind of flagship model of OpenAI, for example, the same kind of task, DeepSeek is kind of getting this, what we could call parity. Now, you know, different benchmarks are out there, et cetera.
And, you know, each model has its own biases and different behavior. But yeah, the first kind of narrative around this in the news is, whoa, this came out of nowhere. There's this new company. It's just a startup.
They did this on the cheap. So they've kind of published numbers around 5 million, 5.5, 6 million, somewhere in that range for the final training of this model. And compared to what it took to train OpenAI's O1, that's like a drop in the bucket. So yeah, this first narrative, what are your thoughts on this, Chris?
Well, I think that there's a lot more information we want, you know, because we'll dive into the specifics, but is very highly performant. It's comparable to the best models that OpenAI has had out there. But the thing that's really rocked everybody's world is the fact that it was trained at much, much less cost, at least the parts we know, which we'll dive into that detail, as we said. There's some things we know, and there's some things we don't know, but it appears to have been achieved at a much, much lower cost than all of the competing models from anywhere in the world up to this point.
And so in short, the AI world and I guess everybody outside the AI world that cares about this stuff is in this giant debate and conversation about the implications. Is it a big deal? Not such a big deal? Why is it a big deal?
You know, is it overblown? And of course, Daniel and I are about to dive into all of that right now. It's a target rich environment, as we like to say in defense. Are they surveilling us while we use the model?
Exactly. Could you, should you, might you run the model in all sorts of different ways? Yeah. There's tons of confusion around this, Chris, which is one interesting thing that hopefully, hopefully after you listen to this podcast, you're not more confused.
We don't make that guarantee, but hopefully, hopefully that's, that's the case. Yeah. It's interesting. So I think one of the stories around this and there's multiple kind of narratives that we can go into here so much to talk about.
One of the narratives is around how some Chinese startup with a much lower budget or spend on the model building built such a good model and essentially gets parity to models from OpenAI and others. In particular, the comparison has been made to the O1 model, which if you remember, we talked about this on the show, this is OpenAI's sort of thinking quote-unquote model. So the model, when it generates output, so you put in a prompt, the LM generates text output. A beginning portion of that text is sort of thinking content, meaning they're training the model to sort of spit out logic of how to solve maybe a deeper problem or reason about the input prompt before it actually gives its final answer.
If you're in the ChatGPT interface, you can see this kind of in a different color or grayed out. If you're using the API, I don't think that they send that back in the API. You still pay for it, but I don't think they send it back. This is a similar type of reasoning model, so DeepSeek R1.
And in this reasoning, this kind of very kind of flagship model of OpenAI, for example, the same kind of task, DeepSeek is kind of getting this, what we could call parity. Now, you know, different benchmarks models from various different places, which then leads into natural discussions about where are these models coming from and can I trust them and how does it behave differently than what I'm used to using and can I run it securely? All of those sort of things pop up and they popped up sort of immediately and captured a lot of attention around that. I agree.
What's up, AI practitioners? Adam here from Chainslog. I want to tell you about how much I love Notion. I know Daniel and Chris love Notion as well because we use Notion to organize everything, and behind the scenes here at Chainslog.fm and CP.fm, we work with a lot of cool teams externally, and we create dashboards and workflows and operating systems, essentially to work well with others outside of our domain.
And the cool thing is, is Notion is so flexible that we can do anything with Notion. And the coolest thing I'm loving about Notion is their Notion AI. I can search across all my notes, all my docs, get context, get summaries. It's all AI powered, all inside my Notion, powered by all the content in my Notion.
So I can work with external teams, internal teams, I can build workflows, and all this AI has really helped my team, my tools, my knowledge base be empowered to do our best work. And unlike other tools out there that you got to jump from one thing to the next to the next, and it's just not seamlessly integrated, Notion is seamlessly integrated, infinitely flexible, and it's beautiful. It's easy to use, mobile, desktop, web, shareable, web shareable. I mean, you name it, Notion can do it.
And the fully integrated Notion AI helps us work faster, write better, think bigger, do tasks more efficiently, things that would normally take us hours, now takes us minutes, maybe even seconds in some cases. And yes, we are a small organization compared to Fortune 500 companies, but they are used by over half of Fortune 500 companies, and teams that use Notion send less emails, they came to more meetings, they save time searching for their work, and they reduce their spending on tools, which helps everyone stay on the same page. Try Notion today for free when you go to notion.com slash practical AI. That's all lowercase letters, notion.com slash practical AI to try the powerful, easy to use Notion AI today.
And when you use our link, of course, you are supporting the show and we love that. Notion.com slash practical AI. Well, Chris, I do want to get to some of the technical details that we know about, you know, what the model is and versions of the model, but maybe before that, it would be useful to address the elephant in the room, I guess, which is the security element of this or the cybersecurity privacy issues related to this. So there's been the geopolitical elements around, oh, you know, is the US ahead?
What does this say about dominance in the space? That's one thing. There's another thing, which is my company, which previously potentially had problems with pasting things into ChatGPT that they weren't supposed to, like, you know, whatever it is, customer details or IP or whatever. Those kind of, that shadow AI usage was already happening in companies, right?
And people were concerned that their employees were pasting things into ChatGPT. Well, now there's this sort of new player in the space. People are pulling that up because it's the new amazing AI app. And it turns out that is run by a different company and that data is going to a different place and is being housed on, you know, Chinese servers.
So there's that element. So we need to kind of parse through that, but also this has produced a separate confusion from my perspective around the quote security of the model. It is DeepSeek secure. I think that's like a very, we have to clarify what we mean when we say that question because I don't know what you, I don't know what the things you've seen, Chris, there's a lot of stuff out there that is not very helpful in this sense.
Yeah, a lot of fear, uncertainty and doubt. And some of it may be justified. Some of it may not be probably a lot of it may not be. for me, the scarier thing is, is not the model itself.
It's the infrastructure around the model, where it's being housed, what external entities to the core data scientists at that company have access to it. And I think that's where, that's where a lot of the concern is going to be. It's if you have a separate, if you've downloaded it from hugging face and you're running it on your server, that's not to say that, that every facet of security is being accommodated, certainly, but at least you've taken some of the issues potentially out of the security equation. So, you know, I definitely, I have on my personal phone, I have the, the app, which is unusual for me, but knowing that we were going to do this and wanting to play with it a little bit.
But I am, I'm very wary of my, uh, of, of what I don't know about that at this point. Yeah. Yeah. So you draw out something really important, Chris, and, um, I actually, I wrote a blog post about this that I'll link in the show notes of this, this episode.
If you're interested, you can take a look and it might be a good resource if you hear your engineering management or people in your company, you know, hyping the fears around deep seek, you know, and maybe you want to use that in, in a secure environment. Maybe that would be a good tool that you could point them to. But all that to say, I think the main thing that I wanted to highlight in that was this element that you just described. So there's really two ways to access this model.
So there's two ways to utilize deep seek R1. One is via a product offered by the DeepSeek company, which is a software product that you access and they host. This would be parallel to a lot of other software products like OpenAI, host ChatGPT. That is their product, which has a model interface embedded in it, but it is a product similar to like you using Airbnb, right?
You go to Airbnb, you put in your personal information into Airbnb. They have certain terms and service that, that they hopefully follow, but you, you have no view into what's going on under the hood of Airbnb or ChatGPT or this deep seek AI product. And so it's really not the model in that case that is not secure, quote unquote, in terms of you putting data into it. It is the product built around that model.
And it is very clear from the terms and service that DeepSeek has posted that they will gather all of your, you know, well, I shouldn't do a blanket statement like that. They say exactly what they will, what they will get from you, but they're saving a lot of your personal data and information. They will use that for future model trainings. And that is housed on, you know, quote, servers in China.
So there's, there's that element. So we need to kind of parse through that. But also this has produced a separate confusion from my perspective around the quote security of the model. It is DeepSeek secure.
I think that's like a very, um, we have to clarify what we mean when we say that question, because I don't know what you, how, I don't know the things you've seen, Chris. There's a lot of stuff out there that is not very helpful in this sense. Yeah. A lot of fear, uncertainty, and doubt.
And some of it may be justified. Some of it may not be probably a lot of it may not be for me, the scarier thing is, is not the model itself. It's the infrastructure around the model where it's being housed, what external entities to the core data scientists at that, at that company have access to it. And I think that's where, that's where a lot of the concern is going to be.
It's um, if you have a separate, if you've downloaded it from hugging face and you're running it on your server, that's not to say that, that every facet of security is being accommodated, uh, certainly, but at least, uh, you've taken some of the issues potentially out of the security equation. So, you know, I, I definitely, uh, I have on my personal phone, I have the, the app, which is unusual for me, but knowing that we were going to do this, uh, and wanting to play with it a little bit, but I am, I'm very wary of my, uh, of, of what I don't know about that at this point. Yeah. Yeah.
So you draw out something really important, Chris. And um, I actually, I wrote a blog post about this that I'll link in the show notes of this, this episode. If you're interested, you can take a look and it might be a good resource. If, if you hear your engineering management or people in your company, you know, hyping the fears around DeepSeek, you know, and maybe you want to use that in, in a secure environment, maybe that would be a good tool that you could point them to.
But all that to say, I think the main thing that I wanted to highlight in that was this element that you just described. So there's really two ways to access this model. So there's two ways to utilize DeepSeek R1. One is via a product offered by the DeepSeek company, which is a software product that you access and they host.
This would be parallel to a lot of other software products like OpenAI hosts ChatGPT. That is their product, which has a model interface embedded in it, but it, it is a product similar to like you using Airbnb, right? You And similar to like when Gemini, you know, was trying to create diversity in their image output and generated some really interesting looking things, or what ChatGPT or anyone does when you send in a prompt, they inject stuff into that prompt, they do post-processing. It's a product, right?
You don't have visibility into any of that. And so they're doing obvious product things there to introduce artificial biases. Now, I do think that it is possible that in the sort of alignment fine-tuning process, DeepSeek had their own vision of how they wanted to align that model, which may not be malicious in any sort of way or kind of biased in weird political ways. It might just be their choice of how they wanted to bias that model.
In other ways, maybe it is motivated by certain things. I don't know. But that model will have its own sort of biased behavior. The other thing that I think has been shown in a number of places, um, with secure did a study of this, and it is a model that is also way more sensitive to prompt injection attacks than kind of the many other state-of-the-art models, which produces another type of vulnerability at the application layer.
So you've taken care of like the model hosting security issue, but all that to say, that doesn't mean at the actual use of the model or in the integration layer, you shouldn't still be asking relevant questions, which again, I highlight some of those things in the blog post if people are interested. Well, friends, AI is transforming how we do business, but we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and data products platform comes into play. It's built for the challenges of today's AI landscape.
With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact. While many companies focus on narrow applications or single model solutions, Domo's all-in-one platform is more robust with trustworthy AI results without having to overhaul your entire data infrastructure, secure AI agents that connect, prepare, and automate your workflows, helping you and your team to gain insights, receive alerts, and act with ease through guided apps tailored to your role. And the flexibility to choose which AI models you want to use. So, Domo goes beyond productivity.
It's designed to transform your processes, helping you make smarter and faster decisions that drive real growth. And it's all powered by Domo's trust, flexibility, and years of expertise in data and AI innovation. And of course, the best companies rely on Domo to make smarter decisions. See how Domo can unlock your data's full potential.
Learn more at AI.Domo.com. That's AI.D-O-M-O.com. So, Chris, there's also the element around this that we always like to do when we get into our deeper discussions around any particular model, which is, what are the unique technical or architectural elements of this? What types of versions did they release?
This actually might be very confusing for people when they see like DeepSeek R1, Distilled Quinn 32B, right? There's a lot of words there that might not make sense. Jay Almar, who we love on the podcast and has been on the podcast, he runs this, has posted for years a lot of great blog posts about kind of illustrated transformers and other things. As a learning resource that you might want to take away from this particular episode, he posted an Illustrated DeepSeek R1 article, which goes through some of the details.
What's interesting here, Chris, I don't know if you got through any of that, but the overall picture of how they did this fine-tuning is fairly similar to, I think, how many people have been doing fine-tuning for some time. And I guess that one of the things I wanted to bring up on this is, I believe, correct me if I'm wrong, it was based on one of the llama models, right? Well, the DeepSeek architecture has been around for some time and is a specific architecture. It's similar to the llama architecture.
Like it is also involving layers upon layers of transformers. In terms of the exact architecture of this DeepSeek R1, it does involve mixture of experts layers in the model. So there's layers and layers of transformer blocks in the architecture, and then these mixture of experts blocks, which you might see people refer to like activated layers or parameters. These mixture of experts layers don't always, you don't process the input through all elements of that layer of the model each time you run the model, which creates some efficiencies both for inference and often for training purposes as well.
But yes, it's a similar, similar setup, some slight differences, which also kind of those slight differences are the reason why we mentioned earlier likely you kind of have, at least currently as we're recording this, you might have to import some third-party code to support the model in the upstream transformers, which is likely to change quickly. How does that affect the fine-tuning though? In terms of, you know, how DeepSeek approached it versus maybe how Llama has been approached and stuff. Are there differences?
Are you seeing a lot of similarity there? I know Sam Altman made a comment and I'm not quoting him directly, but it was something to the effect of once, you know, once somebody else has already done something that you're basing on, it's a lot easier to do that. And that was his kind of minimization of what DeepSeek had done. I'm kind of curious, you know, how does that affect this in terms of the fine-tuning?
I mean, the overall, like I say, the overall process, and when I say overall process, this is often kind of a pre-training step of very raw data that is completely unsupervised, a fine-tuning step, which is supervised, and maybe an additional fine-tuning step, which is like a preference tuning. That sort of overall picture of how the training is done seems to also be true here. This is like the overall picture of how they did that. Now, there are some unique elements of this in that they created this deep-seek R1-0 model, which is kind of, and they use this interim reasoning model to actually help generate some of the data for that supervised fine-tuning step.
So this is where going back to the original discussion in our conversation, that 5 million number corresponds to maybe that final or one of the final training steps, but not necessarily the data generation. So they used interim models that they, you know, their intention wasn't to release. It doesn't perform great in terms of a general purpose model, but it might perform well to generate long chain of thought examples like this reasoning, these reasoning examples, to add into the training or the training data that's supervised fine-tuning, which allows you to augment your fine-tuning data, use less human resources to create that fine-tuning. So I forget the exact figures, but meta did spend a ton of money in terms of the data curation with human data labelers to create those datasets for the llama models and probably still are.
In this case, at least some of that data was this synthetic data that was, that was generated by this interim model. So that's kind of one interesting step of, of the process that may be, you know, that, that maybe is relevant to some of the budget and efficiency considerations. Sure. And maybe that's, you know, part of the motivation of leaving that out altogether is, you know, that wasn't direct costs necessarily to them, or at least not in the way that it happened when it was originally manufactured.
Yeah. Yeah. So there's the, this DeepSeek R1 model, which again is the architecture is not fundamentally different from architectures of what we've seen in the past. There were some creative things done in the training process, both that Significant portion, assuming it was a significant portion of data that, which was synthetic or generated data.
They also used some kind of automated processes and model-based processes to filter and curate that generated data to actually filter out good examples from kind of all of the candidate examples. So there were some very creative things in that data generation piece, but the other stages of this were not not fundamentally stages of training that we haven't been familiar with with other model releases. There are a number of model versions that have been released from, from DeepSeek. So there's the DeepSeek R1, the sort of flagship, which is like 700 billion parameters or something.
It's very large. You're going to need at least at full precision, you're going to need many, many GPUs to run this. I think Phillip from Hugging Face said, like his, what he said was like 16X 80 gigabyte GPUs, like 16X H100s. To give you context, I think an H100 will, if you have it up all the time at on-demand pricing in a cloud is going to run you something like 60 to 80 grand a month, something like that.
So you need 16 of those to run the model, the full model in full precision with at least Nvidia GPUs. Then they've released other variants of this model. And that, that full model is that mixture of experts model, which has the element of the kind of external or third-party code added into it. They've also released distilled versions of that model.
So we can get into that here in a second, but just wanted to make clear kind of what the main model looked like. And these distilled versions of the model, if you go to Hugging Face, you can actually look at the collection from DeepSeek for DeepSeek R1. And what you'll see is a whole bunch of different DeepSeek models, which again is often a point of confusion with people. Like what do all these things mean?
So we've got DeepSeek R1. We have distilled llama 70B, distilled Quinn 32B distilled llama 8B, et cetera. These, these ones are really significant for people to maybe But now, like, you know, apparently, if you've got $5 million sitting around, you could create a best in class model. And, you know, what does that mean?
It means all of these are going to proliferate very quickly. This is not the last of these types of models we will see. They will proliferate very quickly. And you having kind of model lock-in, quote unquote, like you built all of your AI functionality around this particular model, whether that be open or closed, that's not going to work out great for you in the long run, just because of the models changing.
And so building in this ability to swap models to have control and configurability, I think that's one of the kind of trends there. The other one I would say is now that you're considering bringing these models into your own infrastructure, like there's parity with OpenAI in many respects, it brings up all of these questions that we were immediately prompted with, right? Like, well, if you bring that into your environment, what are the security concerns related to that? How can you run it robustly and reliably?
What should you be monitoring in production, right? So it brings up all of these additional questions, which I think overall will be really good for people to consider because they're probably things they should have been considering for the past year as, you know, so many things were built on kind of one model family. So, yeah, those are a couple of thoughts. I don't know if you have additional ones.
I've been speculating on whether, you know, as as valuations have been going up and up and up for all these different AI startups all over, you know, across the entire globe and the budgets have just grown astronomically. Is this the moment where, where at this point, investors looking at it going, why do you need $100 million? Why don't we give you $5 million and see what you can do with it? Look what they did with it.
Look what they did with it and stuff. And so, you know, whether the cost of operations in AI startups is now going to going to affect. And if that's the case and with potentially not everybody being able to be quite so productive with their $5 million, you know, what does that mean? Are we going to go through a little bit of a cleanup round in the AI startup world or whatever?
Any thoughts on that as we finish up? Yeah, it's interesting. I mean, it definitely has an impact if you're kind of a model building type of startup for sure. And we've seen other examples of this even last week, you know, hearing from Geno and what they're doing with video generation models.
We have a very small team and what they're able to accomplish. I think it definitely has implications on that side. I think it also has implications for those sort of, hopefully, I think people will start thinking about less the kind of model building ventures, which will be interesting, but also this is going to make model building and fine tuning even more accessible to kind of the enterprise, which will kind of fuel both tooling and infrastructure type of investments as well. Now, those I don't think will have the kind of inflated, like, oh, I need $100 million to train a model sort of scenario, but yeah, we'll see how that shakes out.
I think in, I think in enterprise world out there, maybe as a final thought on this, I think that you're now going to see existing budgets, which are, which often for large companies, you know, are, are in the hundreds of millions of dollars, you know, where they're not AI specialty companies, but they, but it matters enough to have a big budget. The expectation that I've always used other models and we've built lots of infrastructure around that. Maybe there's pressure now to actually go and work on specifically models for your business that you're creating because you have, you know, $5 million times X available to do that in the new way of thinking. So it may certainly change what the expectations are in enterprise world.
What I think will be stressed in that case is the data curation and human involvement in that process. That's not, I mean, there's clearly a big investment on that side here. So even if you spend $5 million on that, you know, final training, there's definitely a process that goes into that. I agree.
Good conversation today. Yeah, definitely Chris. Definitely check out some links that we'll put in the show notes to various articles about this, both on the technical and the kind of more hyped geopolitical stuff. So check it out.
Thanks for joining again. Great to talk, Chris. Yep, absolutely. See you next week.
All right. That is our show for this week. If you haven't checked out our changelog newsletter, head to changelog.com slash news. There you'll find 29 reasons.
Yes, 29 reasons why you should subscribe. I'll tell you reason number 17. You might actually start looking forward to Mondays. Sounds like somebody's got a case of the mundays.
28 more reasons are waiting for you at changelog.com slash news. Thanks again to our partners at fly.io to break master cylinder for the beats and to you for listening. That is all for now, but we'll talk to you again next time.