Operationalizing ML/AI with MemSQL - Changelog Master Fee...

What this episode covers

A lot of effort is put into the training of AI models, but, for those of us that actually want to run AI models in production, performance and scaling quickly become blockers. Nikita from MemSQL joins us to talk about how people are integrating ML/AI inference at scale into existing SQL-based workflows. He also touches on how model features and raw files can be managed and integrated with distributed databases.Sponsors:DigitalOcean – DigitalOcean’s developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99% uptime SLA, and 24/7/365 world-class support to back that up. Get your $100 credit at do.co/changelog. Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Rollbar – We move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog. Featuring:Nikita Shamgunov – Website, XDaniel Whitenack – Website, GitHub, XShow Notes:MemSQLMemSQL’s ML/AI capabilitiesMemSQL’s recent AI/ML e-bookContact tracing case study with MemSQL and True DigitalUpcoming Events: Register for upcoming webinars here!

of MATCHES

TRANSCRIPT · AUTO-GENERATED

Being with your change log is provided by Fastly. Learn more at fast.com. We move fast and fix things here at ChangeLog because of RollBar. Check them out at rollbar.com.

And we're hosted on Linode Cloud servers at the linode.com slash change log. This episode is brought to you by DigitalOcean. DigitalOcean's developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99 uptime SLA and 2475 world class support to back that up.

DigitalOcean makes it easy to deploy, scale, store, secure, and monitor your cloud environments at the dio.co slash change log to get started with a $100 credit, again dio.co slash change log. Welcome to Practical AI, a weekly podcast that makes artificial intelligence practical, productive, and accessible to everyone. This is where conversations around AI, machine learning, and data science happen. Join the community and Slack with us around various topics at the show at change.com slash community and follow us on Twitter if you're at Practical AI event.

Welcome to Practical AI. This is Daniel Weitmak. I'm a data scientist with SIL International. Normally I would be joined by my co-host Chris Benson, who is a principal AI strategist at Lockheed Martin, but he's in the midst of some family health related things.

So he's taking the time that he needs. But we're definitely excited to chat about a really interesting topic today. Actually, in our Slack channel, I remember some conversation a couple of weeks ago where we were discussing the issue of, hey, I trained my model, I works great on my data, I evaluate and it all seems good. But then when I try to integrate this into code, the performance is actually really terrible.

And it's kind of a mismatch between production things. And I think that we're going to be able to get into some of those things today. Today we have as our guest, Nikita Shampunov, who is the CTO of MemSQL. We're really excited to talk to you today.

Nikita, welcome. Happy to be here. Like Daniel said, my name is Nikita. I actually have co-CO and founder of MemSQL.

I don't mind the confusion. I started as a CTO. And I took over as CEO in 2017. Gotcha.

And recently, about a year ago, brought at Co-CO, Raj Vorma, with the thinking that we're going to take the company public. Ah, gotcha. On that note, why don't you give a little bit of maybe the background first of yourself? And then we can get into maybe a little bit of the background of MemSQL.

I think that would be great context. Definitely. So I spent my career in data management and databases, specifically. I came to the States and the United States, after finishing my grad school at St.

Petersburg, in St. Petersburg, Russia, and joined the SQL Server engine team. So went from kind of very research oriented life and work to basically system engineering. When you build databases, it's actually a very different cadence versus using databases.

When you use databases, you think about things like performance. You think about SQL as the APIs of the database. And then you think about reliability and outside. And when you build a database, you think about quality.

You think about the life of somebody who is using the database and how you make that life easier. And obviously, you think about performance and scalability and how the database user who or developer can achieve that performance in SQL. It sounds like an interesting transition from that sort of academic world to the systems engineering world. Was it a hard shift for you?

Or was that sort of focus on the user and reliability? Was that something that you were already passionate about going into that work? I was very passionate about engineering in general. What I loved about building databases, that's the product, the database engine, is like a computer science in a box.

It has algorithms, it has data structures, it has system engineering, you interface with networking, IO, CPU caches. You need to be aware of the computer architecture in order to build world-class software. So that certainly resonated a lot. And that was the core premise of why I wanted to start working on world-class industry products.

And then from there, the passion to the user came in. And over time, just that curiosity about building new things and breaking ground and entrepreneurship came through the years while working at SQL Server. Mind you, during that time, Microsoft was going through a cloud transition. Everything that we're seeing today at scale, at that time, all of that stuff was being born.

Being conceived and major architectural choices were made, some of them were right, some of them were not right. So that's my big company background. And then I switched and joined Facebook. And in fact, one of the premises of me joining Facebook was not to make a lot of money, or because that was at free IPO 2010, actually moved to the Silicon Valley and meet the kind of people I will later start a company with.

And what's happened is that as I walked into Facebook on day one, I met my future co-founder Eric and relatively shortly after, within six to eight months, we started MintiBook and I left Facebook. Oh, that's a wild ride, I guess. Moving to a total new place, experiencing Facebook in that culture, especially at that sort of stage and then founding something. So maybe I describe a bit how that happens so fast, the idea from M.SQL and the sort of motivation that this was something that was really needed.

How did that occur? Yeah, so distributed systems as SQL Server, we always knew that was the future, right? Especially as you go into the cloud transition. And the time, back in 2008, 2010, Microsoft had a flagship product SQL Server, which is a single node database that's very, very, very powerful, really proud of how I work on this one.

The main competitor, Oracle, had distributed systems at its disposal, Oracle, Exadata, and Oracle, Rock. And the way the database market is structured is that the top tier workloads that have high performance requirements, high availability requirements do require distributed systems, and Microsoft didn't have that. And so that top of the market, Microsoft, was losing to Oracle then. Actually, I think they've mostly caught up right now.

But then architecturally, single node databases are very hard to change and make turn them into a distributed system. That was some of the kind of moonshot projects that's a microfounder and CTO Adam worked on at Microsoft and that moonshot project didn't succeed. When I walked into Facebook, the need for distributed systems became apparent. Because every Facebook workload is that high level, high end workload, and sometimes it's from the reliability standpoint, and sometimes it's from the scale standpoint, most of the time the scale standpoint, because back in 2010, I think Facebook was on the March to cross a billion active users.

So that was on everyone's mind. That was everyone's goal. How can we cross a billion active users? And obviously, history shows that Facebook had blown through that goal quite successfully.

Were they trying to architect something internally to deal with that? Or was it sort of an open problem when you were there? It's more than one system, right? Or something that is so big, right?

As Facebook turns out that all the data workloads are split into categories. Some of them are data lakes. And Hadoop basically got a lot of advancements at Facebook. Some of them are operational, you know, powering Facebook.com.

And multiple data management technologies are on the critical path between typing Facebook.com and actually seeing the news feed. There's a separate data management solution for messaging, separate news feed, and list goes on. And within that also, there's a whole bunch of point solutions for various analytical workloads. One is for time series.

There's a startup called Signal Effects who took some of the ideas. And then the folks from Facebook left started that company that was recently acquired by Splunk. And then there was a system called SCUBA that gives you real-time analytics and a lot of ideas there influenced in SQL roadmap as well. So long story short, lots and lots of data management systems and data management workloads inside Facebook.

But each and every one is a distributed system. And so that pervasiveness of distributed system were captivating. It like really validated the thinking that the future of database systems are distributed. And that's how we started that system.

Awesome. And we started this as a memory, hence the name. Yeah. Now, in fact, the name is kind of limiting because memSQL is evolved way past being a memory only.

It's the version one that was a memory in single node. But very quickly, we expanded to a distributed system, built to your architecture for memory to disk. And now we expanded into S3 or other object stores. Yeah.

I'd be interested to hear a little bit about, you kind of gave a sense of the initial founding and some of the initial ideas. I'd be curious as far as right now with memSQL, could you just give a sort of high level view of the sorts of things that people are turning to memSQL for kind of the consistent things that you see really people getting value out of and then maybe some of the newer things that are enabling new sorts of workloads that you didn't even anticipate in those early days? Definitely. So first of all, databases are very elongated.

And the most successful database products on the planets, which are Postgres, MySQL, SQL Server, and Oracle, are all 30 plus years old. Yeah. And we still use it today, which is basically if you turn into any other piece of technology, that's not the case. Yeah.

Technology is very transient. Right? We're building something and something new comes in and completely disrupts what's there before. But database seems to stay for a long time.

Yeah. I think in my experience from working at the different places, I've encountered Postgres a lot. I've encountered right now, I've worked on a team that's using SQL Server for certain things. Of course, I've encountered certain things like Mongo or other databases that are the no SQL or those sorts of databases.

But I think you're talking about the user experience. It always seemed like to me the natural user experience. And you gain a lot of power with that SQL interface to the database. The relational, yeah, for sure.

And so, yeah. So the vision is a single pane of glass to all your data and all your workloads. But when you start peeling the layers and understanding what would it take to deliver on that vision, you start understanding how you scale storage, how you scale compute, and how you scale both storage and compute for your low latency operational workloads. Think about powering your apps and loading a web page.

And there's also need to come back to you and ideally, some hundred milliseconds to running what is called big, expensive analytical queries that scan large volumes of data to give you insights. Those insights would be reports. Those insights could be information, analytical information, which is also called decision support. You need to make a decision.

So you need to know what works, what doesn't. You need to know how your sales are doing in this state versus the other state, this product versus that product. And so that is a continuous process of evaluating and looking at data and understanding, driving insights out of data. So the interesting piece about what I just described is that for your operational needs, you need a SQL database, right?

Like closed-res like MySQL, like SQL Server. I mean, you don't need it. I mean, you can use a MongoDB. You can use a NoSQL database.

But you need an operational database. Right? Just put it this way. And for the majority of workload today, people are using relational databases that speak SQL.

And for a smaller part of the market, they use NoSQL databases, which is more preference than user experience and whatnot. And then for an analytical system, people use data warehouses, Teradata, Snowflake, BigQuery, and the interface to those databases is also SQL. And what you just started the podcast with is like, oh, I trained my model against the data that sits in the data lake or a data warehouse. And now I need to put it in production.

And I have data quality, data consistency issues, my performance is not the same. A lot of that comes from the underlying data management. And if you really peel the layers, it comes from the fact that you run this very same model on top of data and data management systems, they're different. And one can argue that, well, they're different.

There's got the reasons for that to be different. But there's a more contrarian viewpoint here that is we live in the world of clouds, where things are abstracted away from you. And that gives an opportunity to build, ideally, a serverless interface that speaks SQL. And that gives you access to all your data and gives you access to all your data for ordering capabilities, for low latency capabilities, for operational workloads.

And that would allow you to never leave your data universe as you go and move from one workload to another. And that can be huge. That can be huge. Because whatever data you trained, for example, your example, whatever data you use to train the model, lives in that ocean of data.

And that data is easily accessible to you. And then you train the model. So now you need to convert new data that isn't coming, or marry new data to an old data and convert it into Excels, which is your app or a website or whatever. And that can be done right there off of the same data that you've been operating on, which is certainly not to say the case today.

You have a data lake, a data warehouse, and a number of operational databases that can be integrated by a third piece of software, like ETL tools or integration tools, which just generates a lot of complexity. And a lot of that can be simplified if you imagine a world of having a serverless SQL low latency API to all your data. That's the vision way we're driving towards. And this is a multi-year, probably multi-decade.

It means equal to nine years old today. So it's going to be a multi-decade kind of life's work. But the workloads that we see emerging and the new workloads that are enabled by a system like this are real-time analytics and real-time decision support. When you need to kind of go back and look at the history of what was happening to make a real-time decision and do it at scale as well.

So that is something that we see a lot in financial markets. Mexico is a, give it say, $40 million run rate company with 70% growth. We just had an article in TechCrunch where we revealed our numbers. And a good amount of that revenue is coming from financial markets.

And if you think about it, that's what happens there. There's constant stream of information that's coming in, modifying that data state that you have. And you need to make decisions about buy, sell, you need to make decision in wealth management, you need to make decisions in portfolio management, and trading, but also you need to make decisions in various systems that, for example, monitor something that's very, very large. So in Morgan Stanley, for example, it's a trading system.

And we monitor this trading system, providing decision support to, oh, should we provide some sort of maintenance? Should we re-route our trades, all of those things? That's what Mexico is used for today. And that's new, like we didn't have a system on the market that had those capabilities before.

I'm Jared Santo, JS Party's producer and one of nine regular voices you'll hear on the show. We are a party themed podcast, so fun is at the heart of every episode. One way we keep things fun is by mixing it up and trying new things. We play games like JS Jeopardy.

This gives you access to an outer function scope from inside an inner function. Oh, I think that's a good one. Global scope? Incorrect, I'm done with it.

I didn't think so. Debate hot topics like, should websites work without JS? I'm going to appeal to authority and read some quotes at this time. OK.

I've lost complete control of this panel. Go ahead, Ross. The first quote, no code is faster than code. Discuss and analyze the news.

Yeah, this reminds me of when you're playing Pokemon and you have like, you know, an electric Pokemon and what a Pokemon and you try like an attack. Share wisdom we've collected over the years. To be honest, a lot of what we rely on is pretty garbage. It's.

And like, I mean, I wrote some of it. So it's OK. Like, I can say this. Interview amazing devs like John Resig and Amelia Watt and Burger and a whole lot more.

Oh, and did I mention we record the show live? We do. You can be part of the hijings each and every Thursday at Change.com slash live. This is JS Party.

Please listen to a recent episode that peeks your interest and subscribe today. We'd love to have you with us. I definitely liked where you're going in terms of describing the sort of single window to all of your data via the SQL interface. And I know that we talked a little bit.

So we kind of touched on the AI and machine learning elements of this and how they fit in. You're talking about going on this journey to create the single window of single interface to all of your data. How did AI and machine learning workloads start to cross your path at minSQL and start to be something that you felt like needed to be part of the strategy of how you were building out the system? Yeah.

This is a great question. When we did our analysis, we discovered that about 20% to 30% of all the workloads that have been SQL support have some sort of machine learning or AI angle to this. Oh, wow. So this is a very large number.

And when we looked at it, we always wanted to have dedicated AI capabilities in the system. And we certainly use AI internally to make certain decisions around workload management or prioritization. But the fact that the modern workloads, and obviously people with modern workloads on SQL, have a lot of AI and ML capabilities, was eye-opening to us. Yeah.

In those cases that you noticed, was it like people that were, like you were saying, using them SQL to do kind of large queries to prepare their training data for an AI model? Actually both. Both, OK. And two specific examples.

Well, we have a great integration with Spark. So we have a Spark and SQL Spark connector that allows, gives you a very fast data exchange between the SQL and Spark. Fast mean multi-cluster to cluster. So multi-channel bus between the two.

We noticed that people put all their data in SQL, grab it through Spark, store models somewhere else. So we don't take part in hosting models. So this is the first part. And what people like about them SQL is that two-way path for data exchange between the SQL and Spark.

If you have something in Spark, you can persist it in SQL. You can pull it into Spark. And the SQL is a world-class query processing engine. So you can send SQL query to it to do the kind of the first pass and slice and dice data before it gets fed into training algorithms, which, basically, itself doesn't support.

It's just the backbone for that data. And the second use case that we started to see being pronounced is, well, people build apps on top of the SQL. And those apps have models that evaluate models real-time. And they need to, usually, there's some sort of an SLA for an app either displaying this information to the end user or the app is completely kind of back-office.

And they just be crunching data. And for that, they need to pull data from somewhere, render this data against a model. And based on the results that you see from that model, do something, typical examples from. And we do in-transaction fraud detection for some of the major banks, where the SLA is 40 milliseconds to make a decision if that particular transaction is fraudulent or not.

And in order to make that decision, you need to go have a model that's models running trained. Then you need to grab some data for that specific account, go back and look at the previous 1,000 transactions. Feed those transactions against the model. And then the model will tell you, is it a fraudulent transaction or not.

So, men's SQL is supporting use cases like this. Yeah, that's really interesting. Right? So again, like both sides of the spectrum, right?

Both just providing basically a data lake or data warehouse capabilities with all your data in one place, let data scientists play with that data and use whatever data science tools, the tools do sure, right? Should it be as far as? Yeah, most talks equal. Yeah, yeah, yeah.

Should it be as far as to be pandas, should it be tensor flow or high-torch, whatever. We provide very, very fast data exchange to whatever frameworks use. So, I'm going to register that model somewhere in my, you know, within Kubernetes, SageMaker, you know, their tools for that now. And it's a rapidly evolving space.

But it all starts with data anyway. So, you need to have a data backbone and you need to have data management system with system record capabilities in order to provide, you know, uptime, low latency, all of those things. And where it's going is we're thinking to keep building world-class integrations with systems that both data scientists use for training and engineers use for putting models into production to enable that exchange from a push-button standpoint. Given you have a model, put that model somewhere, tell them SQL about that model, and you'll be able to consume that model either from SQL, through user-defined functions, or through an application where the model created the data, you know, in the application provides the glue.

Okay, so yeah, I was curious about that piece. So, it sounds like right now this sort of workflow is, you have an application like a Python application, or whatever it is, you load your serialized model into memory. And then when it's time to fulfill a user request, then you make a SQL query against memSQL, get the data you need, run it through your model and respond to the user. Is that about right?

Yeah, that's how it works today. And where it's heading is, this will still be at probably 50% of the use cases, because certain things you still want to control and write very, very custom logic. But we want to make memSQL aware of models that are stored in a particular repository, and being able to, three SQL, to run data through those models and return results back into memSQL. Yeah, that's really interesting.

Yeah, the reason that's useful is that sometimes you want to run that model against a very large volume of data. And so, if your application, row by row, pulls data from a database, rounds it against the model, gets the results, essentially stores it back in the database, that is an extremely inefficient way. But what you can do is you can establish, similarly, to Spark Connect, a multi-channel bus with optimized data formats, we're thinking about Chillerro or something like this, where running a model against a billion records should be one or two second proposition. Yeah, that's awesome.

I'm thinking of the facial recognition use case or something like that, where you may want to run, compare the embedded representation of this image against thousands and thousands and thousands, or maybe even millions of records that you have in your database that are reference faces from your facial recognition or something like that. Am I following the right path here? You are. And we have use cases like this where people store feature vectors in the database.

In a way, people run this use case in the SQL from the do-it-yourself kind of way. Many people support tensor operations as a built-in. And obviously, if you have facial recognition models, not all the time, but often, is represented as a tensor flow tag, I would say, that evaluates. And the individual knows in the tag are vector math.

They're not something that's spectacularly complex. It's a vector dot product, also known as a scalar vector multiplication, right? So when SQL does that, and we have customers that in production do facial recognition over millions of faces to enable things like, you know, somebody walks into a supermarket and then wants to custom tailor the experience for that person or security systems and airports. And what happens is there's a camera.

The camera looks at the next base. The feature custom logic extracts a feature vector out of the new face. And then you run a query against a SQL that says, give me all the records where vector dot product of feature vectors stored in the database, multiplied by the vector you just received, is between 0.9 and 1. And that gives you all the similar faces.

And because SQL is a distributed system, even though it's a brute force way of doing it, there's no index. You just go literally run the dot product against millions of faces stored in the database. But because everything is so tightly optimized, you can still run this within 50 to 100 milliseconds. Yeah, that's crazy.

And it's running in production, right? It's running in production. And like I said, for both kind of government security use cases, as well as things like walking into a grocery store. And the system suggests how great that they usually buy are not in the system right now, but they'll buy this on something else.

Gotcha. Yeah. So I was curious as we were talking about, I guess that's a computer vision use case. And I'm thinking about the types of data that are involved in machine learning and AI workloads.

And we've got, of course, imagery and video. And we've got a lot of natural language processing going on these days. And some of these types of data I've dealt with in SQL databases before, of course, numbers and strings and that sort of thing. But I wouldn't typically think of, oh, I'm going to store this image or video or like an audio file or something in a database.

So are you thinking that in the longer term, the good workflow around this is that you're storing the sort of feature vectors or embedded representations of maybe text or audio or maybe like spectrograms of audio via the tensor built in or those sorts of things? Or are there other ways around that? This is a great question. To me, it's what it is now and what is it going to be as we go.

And I will give you a very kind of product-centric answer to this question, like what would a product manager think? And they always start with a user. The user in this particular case is, again, data sciences from the training standpoint and an engineer from building an app standpoint. I think today data sciences and with the tools that the data scientists use, it's a lot more natural to store those data in the data lake, basically in S3.

It's bottomless, it's files, it's cheap, and all the tool sets work out of the box. And the reason to put that data into a database is only when you get some sort of additional benefits to that. When you put structured data, the benefits are obvious. The aggregations, so it enables low latency access to that data, and enables very fast aggregations and reporting.

So you can slice and dice that data in the database before pulling the data out and use your custom tools to provide reporting. For unstructured data, the only benefits that I see are governance. Database can provide that unified access layer to all your data, but it doesn't give you any compute benefits over that. So that's the way we think about it right now, as well as exploring.

I think what's going to happen in the future, that database is just like multiple, will give you an option to access that data that's stored in the data lake and in the file system through the database API, with the benefit of marrying that data and really understanding metadata, potentially building a full-text like index against that data. So you can marry that data with the rest of your enterprise's data, which is usually relational. But do not yank the direct access to the file system, because that's what data scientists do every day, and they would be confused if you remove that access pattern from them. Yeah, I guess on that side of things, we kind of talk a lot about the operationalizing of models.

On the training side, now we're talking about access to files and all of those things, and you're saying you have the one interface or the integration with Spark. For me, a lot of times I store everything in S3, like you're saying, it's very natural for me. I just say, I want this file, and I'm going to use it, but there's definitely issues that come up very quickly on that front, too. I know even this morning it's trying to deal with 200 gigabytes of audio data, and I was just sitting around for a while and making coffee.

It's not very productive or fun to deal with those sorts of things. When you say on the training side of things, you have people that are used to the Spark interface can do that. Are there other ways with MemSQL that if I want to access my audio files in S3, is there a way to do that with MemSQL outside of Spark? Are there other sorts of interfaces I can use?

Not at the moment. But I will share some of the thinking. So right now there's a lot of technology we're building around this relational data, and providing that single pane of glass window into all your relational data. That's where we're the strongest.

When we think about S3, we think how we can offload all the data that's not currently touched by the system into S3. We call this then bottomless and making databases bottomless. If you think about Postgres, Postgres is not bottomless. It's bound to the amount of hard drive that you run Postgres on.

But we want to make it completely bottomless and very, very cheap. And S3 is probably one of the cheapest ways to store data in the cloud. And we have things like Minayo that is one of the cheapest ways to store data on premises. When specifically around that pattern that you described, I have an audio file and it's a pane to go and transfer that file from one device to another.

And it's a pane to download it from S3 to your local storage and all those things. So the thinking there is, again, it's through integrations. If that equals a where, that here's the file in that particular format stored in S3, and then you want to somehow either bring computation to data, or you want access to a subset of that file. And only that you want to bring into your training environments, also either running on the cloud or somewhere else.

So we want to enable those things. That's where it stops so far. That's where I think it stops so far. We're certainly aware of the scenarios and we're aware of some of the pains that people go through.

The place where we think Minayo can add value is versioning, because you oftentimes need to run and rerun experiments. And the model, it's not just the model, it's the model and the data that's been trained on. That's really the units that is consistent. And if the data changed, the model might be rendered obsolete, might not be.

So it just versioning makes a ton of sense from the ability to run experiments, verify experiments, share and exchange models and data across data sciences. So I think that's where we can provide a nonlinear amount of value over time. All right, well, turning now a bit, I think from the AI and ML integrations, maybe to more analytical workloads, I know that when we were talking before the show and in conversations leading up to the show, it sounds like that there's some pretty interesting things going on in terms of Mems SQL being used during the COVID-19 pandemic. And of course, there's interesting tracing work going on and all of those things that I've heard about.

But I haven't really heard about how some of those things are being enabled. So I'd be curious to hear a little bit about how to send things to the company. but I haven't really heard about how some of those things are being enabled. So I'd be curious to hear a little bit more about that.

Definitely. So let's step back for a second and think about what's different parts of the world and different companies and governments. What do they fundamentally want to accomplish as we go through the pandemic? The first one is, you know, simple.

How do we stop the spread of the virus? And okay, well, maybe we cannot really stop it or let's say we put our actions and efforts to do that, but since it's spreading, it's a matter of fact. What else can we do and how we can drive our decisions based on data? What kind of decisions?

What could be capacity planning for ventilators? We know that there's an outbreak there and we will likely have our healthcare system overrun and we need to provide extra capacity to the healthcare system, but how much capacity? So all of those questions require answers and the answers are in data. That's where data science comes in.

And that's where just starting from collecting the data, putting in one place, organizing the data and feeding this information to people who have the leverage of power. Second one is like, who owns the data? We have, obviously, Apple and Google who own the data because they have a device, every individual on this planet, not everybody, but most of them, now on a smartphone. So you can tap into that stream of data and get information about who is at which location at any point in time and then marry that location with migration patterns and marry that location with like individual tracing.

Given that we know that this person has COVID-19, who are all the people that this person came across in the past two weeks, so we can go reach out to them and say, hey, you probably want to be tested. The second entity that has that data, obviously, maybe government, but I don't know about that, but certainly telcos. Telcos have this information, maybe not as accurate because they don't have a GPS on the device. I will actually do GPS on the device, but they may not be able to tap into the GPS.

But they can't triangulate the location based on cell towers. So we're working with some of the largest telecommunication operators here on the outer space, as well as around the world. And I think the one that's public is true digital, one of the largest telcos in Southeast Asia. And we do the migration patterns where, if you go back to like March, February timeframe, we already knew that there was an outbreak in China and it was an outbreak in Italy.

And we already knew how bad that was. And looking at the flights from Italy and tracing individuals that land and then starting to see this pattern of people getting sick emerge, you can start driving decisions off of this. And you can start putting policies in place that can stop the spread. You can start to capacity planning.

You can start manufacturing mass ventilators and distribute them into places based on the patterns that we're observing. And so that's how data management solutions are helpful to companies that have the data and also the insights that those systems generate are useful for people with a leverage of power to drive policy and to drive decisions. Yeah, Google and Apple especially, Google has the technology, but you know, telcos don't. And that's where we partner and give them those abilities.

Yeah, it strikes me that, you know, the things you're discussing, there's definitely a lot of potential and value there and earlier on in the episode about facial recognition and a lot of things that are possible there on a sort of large scale. And I think that as people are now in this pandemic and kind of layered on top of that, all of the climate that's in our country and around the world, around injustice and policing, there's a lot of people asking really good questions about actually data management and security and privacy. And I'm curious, you know, with you being in a position to have so many conversations with different types of entities around like how they view data management, how that's changing as we think about these powerful applications of a kind of large scale analytics, but also the potential concern with privacy and tracking and all those things. I'm just curious to get some of your thoughts on how large organizations are starting to view data management and security, maybe now a little bit different than they might have in the past, given that all the things that are going on in our world.

Definitely, it's a multifaceted question. It starts with data management to highlight everything that's going on and the big problems that we face and big issues that will be faced as a nation. How can data management help here? And I think one of the answers to that of many, right?

There's so many things that would go into solving these big issues that you raised. But one of those things where data management can actually help is with data sharing and data consumption, right? Imagine that police data was given by the government to the whole world in the easiest way from the consumption standpoint. And it's completely real time.

So if you have an arrest and that arrest by regulation has to be a part of a public record, that is in the system in 10 seconds after that arrest happened. And so that information is just live real time for everyone's consumption. And with our vision of a single pane of glass towards all your data and all workloads, we will be able to enable those things, enable anybody to log in into our cloud service and consume that data, assuming the providers will need to publish that data. Imagine that climate change data is available to anybody in real time and it's live and it's easy to consume.

So where we live today is a lot of data sets are public and a lot of data sets are public and there's regulation that forces them to be public, but they are published in a non-standard obscure way. Yeah, they're not discoverable. They're not discoverable. So to consume that data set is a project.

It's like going to a library, going to a court and asking for permission and they will bring this papers and put it on the table. I'm inspired by re-watching the spotlight where they got access to some sensitive data that had to buy low-end public and they had to jump through hoops. But imagine all that data is discoverable. It is at your fingertips and that data is up to date.

So you don't have to think about it like, oh, I downloaded this from last month. What changed between last month and end-to-date? So it's just there. That can make a lot of things easier, more transparent and we'll be living in a better world.

We need to think about the implications of that from the what if bad guys had access to this data, but that's a policy question. That's not a data management question. I think data management should enable us to live in a world like this and the technology is already there. Yeah, and I imagine that if you have this sort of single way to interact with data that's centered around SQL and people are familiar with that, they're able to use it.

Also, in addition to the sharing of data, there's sort of the sharing of methodologies that can happen. For example, even in our last episode that we recorded, we talked about some tooling that's out there around fairness and bias and other things. It's a little bit like you have to read a good amount of documentation. You have to figure out how to use these things.

I wouldn't say it's like seamless and easily integrated into your workflow at this point. But I could imagine, for example, if a suite of tooling that is easily accessible will be a certain SQL workloads that look for bias in your data on certain features or highlight certain things in your data set and all those things. And whether you're using, like you say, whether you're using TensorFlow or PyTorch or Spark or whatever, you could potentially have access to those things in terms of people sharing their methodologies because things are centralized in terms of the SQL language. Do you see that?

I'm wondering, what's the memSQL community like, I guess, in terms of people working on projects built on top of memSQL? What's that community like? And do they share certain things like that? Or certain things available that are maybe open source that are built on top of memSQL that people can work on in a collaborative way?

The community is on the forum.msql.com. And then there's a community of mostly enterprise developers because that has been on focus so far that are sharing through memSQL events and conferences. Where we go in is now that we've gotten here and we're opening up the platform more and more to the community, we're thinking a lot in terms of free and how we can make a lot of the things that got us here to get us to the 40 million run rate with 70% growth. How can we take some of that and open them up?

And by opening up providing certain set of features and capabilities to the world for free. So on our dime, you go in the cloud, you log in and there's this free tier of stuff that you can do. That's our current thinking so far. I'm actually going to be personally overseeing that offer here at the SQL.

Yeah, that's really exciting. I'll be excited to kind of dig in and play around with those things. One other thing that I guess is COVID related and also related to our changing world is, I guess people's just workflow and productivity during this time. I'm just curious with memSQL growing so fast and obviously a lot changing, a lot happening.

How has that been for memSQL and how do you see kind of tech work from home and productivity sort of stuff moving forward from your perspective as a CEO? Yeah, so first of all, we are in the uncharted territory, right? memSQL wasn't the company that was born remote first, right? Even though we're global and we have office in San Francisco, Seattle, Lisbon, T of Ukraine, India, and its Bangalore and sales offices all over the place, there's still concentration in each location and usually a particular concentration of like a component that people work on within an individual location.

We weren't impacted from our performance standpoint. It's in one quarter of COVID. That was basically we just finished our COVID quarter. We demonstrate tremendous results.

We're very happy and excited about the future. And we obviously shifted all our workflows into working from home workflows. Now, the worry that I have and being paid to be paranoid is that it works fine so far because we are tapping into the social capital that we've built over the years, right? And a quarter of COVID is we spending that social capital and all this like social links are established between people and they've built them while working at a particular location and looking into people in the eye to their friends and colleagues.

So that's gone, right? So every meeting is a formal meeting, if you think about it. When this, you know, in hallway conversations. Yeah, I guess I hadn't thought about it that way, but it's not.

Yeah. We're missing out on the hallway conversations that we're missing out on, you know, grabbing coffee together and having this like nice positive experiences, brainstorming while walking towards a nice coffee shop and grabbing a lot of say. So I want those things to be back. So hopefully this will happen relatively soon and we'll have a dent in the social capital that we've built and then we'll kind of fill that dent by getting back together.

So that's my hope. Obviously we can control that, the situation controls. That's a little bit. Yeah, it's interesting.

I mean, so I've been working previous to COVID. I've been working remote for, I think, maybe about three or four years now. And I definitely get what you're saying. I've had to intentionally over time, like develop relationships with local data scientists or technical people that are, you know, maybe not, so they're not working at the same organization that I am, but it's a chance for me to like, get together with those people and just talk about things.

Cause sometimes I wonder just sitting at my computer, I brainstorm a lot of things. And sometimes I wonder if I'm crazy. Cause I'm just, I'm not talking about those things to anyone except in when I'm like presenting them to my supervisor and presenting them to a group. And I'm supposed to, you know, sound like, I know what I'm talking about, hopefully a little bit.

So that, yeah, it's not that sort of informal environment and that's very interesting observation. I hope that some of that can come back. Definitely. Yeah.

As we wrap up here, I'd love to give you a chance to just let people know, obviously there's MIMSql.com. We'll have the links in the show notes, but as like a data scientist or AI sort of person, are there ways that people can kind of play around with MIMSql and get a little hands on and see what it feels like and how to do certain things? Where would you recommend that they start getting onboarded? Definitely.

So if you want free forever, we have our software and we give our software up to four servers, like I said, it's a cluster software to install whenever you want and run forever. So we call this our software free tier, it grew three times over the past year from the active user standpoints. It's basically one of the best column stores on the planet. So data is highly compressed, stored on disk, very fast reporting.

Everything is a data ball transactional system of record. Right. And so where other companies like, I don't know, that run on premises, you know, they've already got the green plums. They want to charge you for that, you know, you get it free and you can put billions and billions of data points in the system and get very, very fast equal response from this.

In the cloud are a free tier, it's time based. So I encourage people to log in and, you know, you can play around with the system. That would allow you to not use any software and consume everything as a service. But because we run it on our infrastructure, we're limiting access to free for a period of time.

We'll be announcing more changes there. We'll give the system for you for free forever for a limited usage, but that hasn't come out yet. So that's something we're working on. So that would be probably the best places to start.

And of course, go to forum.mnticle.com to learn about the system. Awesome. Yeah, we'll have those links in the show notes. Really appreciate you chatting about everything today.

I think our listeners will really enjoy the content and hopefully check some of these things out and yeah, thank you so much for joining us and hope to have one of those hallway chats with you at some point when things are actually open up. Well, if you're in Silicon Valley or I'm there, I will make sure to thank you and we'll hopefully make that happen. Yeah, definitely. Yeah, thank you so much.

Bye Daniel. Bye. Bye. Thank you for listening to this episode of Practical AI.

People ask us all the time and they say, Hey, how can I support your work? One easy way is to leave a five star review on Apple Podcasts. Tell folks why you listen and why they should too. It only takes about 30 seconds.

And believe it or not, those rating their reviews really do help us rank higher in AI related search results. Practical AI is hosted by Daniel Whiteneck and Chris Benson, is produced by Jared Santo. That's me. And our music is brought to you by the one and only Breakmaster cylinder.

We are sponsored by amazing people at companies who get it. Thanks again to Fast Elite, Linode and Robar. Did you know we have a Master Feed of all Changelog podcasts? We do.

It's your one stop shop for everything we produce. If you like this show, you'll love the Changelog brain science and go time. Check it out at Changelog.com or search for Changelogmaster in your favorite podcast app. You'll find us.

That's it for now. We'll talk to you again next week.

Share this episode

Similar Episodes

Milk Proteins without the Dairy - Adam Tarshis and Dr. Cory Tobin

Jun 9, 2026 ·50m

New Technology in Severe Burn Care - Dr. Katie Bush

Jun 1, 2026 ·31m

New Methods in Early Cancer Detection - Dr. Nate Montgomery

May 25, 2026 ·39m

Strategies in Combating Chronic Kidney Disease - Dr. Salvadore Viscomi

May 17, 2026 ·37m

AI and the Future of Healthcare -- Dr. Emilia Javorsky

May 8, 2026 ·39m

The First Environmental GE Organism Release - almost! Dr. Steven Lindow

Apr 28, 2026 ·25m

Similar Podcasts

PodQuesting Dwight J Randolph- WolfShield Media PodQuesting: -By WolfShield Media and Dwight J RandolphJoin us on an exciting journey to master the world of fiction podcasting! At PodQuesting, we document our quest to improve and innovate, sharing valuable insights, strategies, and behind-the-scenes tips along the way. Whether you're an experienced podcaster or just starting your first show, our podcast is your go-to resource for everything podcasting.Discover practical advice, creative techniques, and lessons from our own experiences as we explore the ever-evolving podcasting landscape. Ready to level up your skills and embark on this adventure with us? Tune in and join the quest!Have questions or feedback? Reach out to us at [email protected] and visit our website:WolfShield.Media The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. The 48 Laws of Power by Robert Greene (Full Audiobook) Robert Greene Amoral, cunning, ruthless, and instructive, this multi-million-copy New York Times bestseller is the definitive manual for anyone interested in gaining, observing, or defending against ultimate control – from the author of The Laws of Human Nature.In the book that People magazine proclaimed “beguiling” and “fascinating,” Robert Greene and Joost Elffers have distilled three thousand years of the history of power into 48 essential laws by drawing from the philosophies of Machiavelli, Sun Tzu, and Carl Von Clausewitz and also from the lives of figures ranging from Henry Kissinger to P.T. Barnum.Some laws teach the need for prudence (“Law 1: Never Outshine the Master”), others teach the value of confidence (“Law 28: Enter Action with Boldness”), and many recommend absolute self-preservation (“Law 15: Crush Your Enemy Totally”). Every law, though, has one thing in common: an interest in t Mind Force Radio.com Mind Force Radio.com Natural Strength Night is an informative, humorous, sometimes a little raucous, good-time of myth busting and honest training information from the trenches. We strive to help everyone involved with old school strength training (without steroids) to not make some common training mistakes. Along with great information, you'll hear a fair share of steroid bashing, flamingo sightings, breaking goons, iron game history, and honest drug-free training information from various leaders and strength coaches in the field to help you get real results! If your primary training information comes from reading "Muscle & Fiction" magazine we'll help get you straightened out. If you love high-intensity strength training, dinosaur style training and just like lifting heavy weights ... or loved Jack Lalanne, Sandow, Grimek, Peary Rader's Iron Man magazine, Brad Steiner's articles, Stuart McRobert's Hardgainer, Iron Nation, Osmo Kiiha's The Iron Master, you will love the show.On The Rugged Individual, we

Frequently Asked Questions

How long is this episode of Changelog Master Feed?

This episode is 54 minutes long.

When was this Changelog Master Feed episode published?

This episode was published on June 29, 2020.

What is this episode about?

A lot of effort is put into the training of AI models, but, for those of us that actually want to run AI models in production, performance and scaling quickly become blockers. Nikita from MemSQL joins us to talk about how people are integrating...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Changelog Master Feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.