Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com. We move fast and fix things here at ChangeLog because of Rollbar. Check them out at Rollbar.com.
And we're hosted on Leno Cloud Servers. Head to lino.com slash ChangeLog. This episode is brought to you by DigitalOcean, the simplest cloud platform out there. And we're excited to share.
They now offer dedicated virtual droplets. And unlike standard droplets, which use shared virtual CPU threads, their two performance plans, general purpose and CPU optimized, they have dedicated virtual CPU threads. This translates to high performance and increased consistency during CPU intensive processes. So if you have build boxes, CI, CD, video coding, machine learning, ad serving, game servers, databases, batch processing, data mining, application servers, or active running web servers that need to be full-duty CPU all day every day, then check out DigitalOcean's dedicated virtual CPU droplets.
Pricing is very competitive, starting at $40 a month. Learn more and get started for free with a $100 credit at do.co slash ChangeLog. Again, do.co slash ChangeLog. Welcome to Practical AI, a weekly podcast about making artificial intelligence practical, productive, and accessible to everyone.
This is where conversations around AI, machine learning, and data science happen. Join the community and select a list around various topics of the show at ChangeLog.com slash community. Follow us on Twitter, we're at PracticalAI FM. And now onto the show.
Welcome to another episode of the Practical AI Podcast, where we make artificial intelligence practical, productive, and accessible to everyone. I am one of your co-hosts, Chris Benson. I am Principal AI Strategist at Lockheed Martin. And with me today, as usual, is my co-host, Daniel Witenack, who is a data scientist at SIL International.
How's it going today, Daniel? It's going great. It seems like the past week or so has been the week of messy data for me, so I've been dealing with a bunch of missing rows and weird data issues, it seems like, for the past week, which maybe that's, like, typical for every person in AI, and everyone's like, oh, that's my week every week. But it seems particularly to have hit me this last week.
But what about you? You're at GTC, right? I am. I'm at NVIDIA GTC, which is their GPU technology conference in Washington, D.C.
It's going on now, although right now I'm hanging out in the hotel room so we can do this. But a lot of fun. I came to Washington at the beginning of this weekend for the AlphaPilot race, and, you know, we've had a recent episode on AlphaPilot, and that was the second of four. Super cool doing that.
And had a lot of fun. Did some various things on stage. And then today at GTC, I've got a session coming up that I'm leading. It's kind of a fireside chat where I'm kind of both moderator and panelist together with a couple of other really, really smart people.
Yes, that sounds great. I hope that maybe some of that will be available at some point where people can access it. Yep, I think they put it all online afterwards. Awesome.
If you want to follow up on that or are interested in other things related to NVIDIA, you can definitely connect with us on our Slack channel. If you go to changelog.com slash community, you can join us on a public Slack and or on LinkedIn and ask some of those questions and follow up on guests and all of those different things. Well, today we've got a treat. We have a guest by the name of James Fletcher, who is principal scientist at Graken Labs.
And I think we're going to talk all about intelligent systems and knowledge graphs in the minutes ahead. Welcome to the show, James. Hi, guys. Thanks very much for having me along.
So I noticed on your LinkedIn as we were prepping for the show, it said a couple of things, and one of them is a little bit, but the first one it says, it says that you're presently leading research on machine intelligence and cognition at graken.ai. But it also, and anyone that listens to the show much knows I'm an animal nut. I just own that moniker. It says that you are an entrepreneur with a background in computer vision for automated veterinary diagnostics.
And just before we got into the main topic, I just wanted to ask you about that. If you could take just a second as a tangent and tell us what that means. Yeah, absolutely. So that was quite a fun project.
That was my first foray into machine vision, which actually started when I was studying. I was studying general engineering at university and ended up in this specialization in machine vision. And I really didn't see that coming. I always thought I was going to head towards mechanical engineering or something like that.
And then when I saw the capabilities that were coming out in machine learning at the time, I was like, okay, wow, this is really good stuff. This is disruptive, right? You can really do something new with this. And no one's using this.
This is clear in industry. I was studying under Professor Andrew Zisserman at the time, who's quite a big name in computer vision. And we learned well. And coming out of that course, I said to him, you know, it's okay if I look at actually commercializing some of these algorithms.
This stuff is clearly enough to warrant a whole company around it. And so off I went and started doing that. That was actually a family business. My dad is also an engineer.
And so the two of us decided, you know what, actually, let's give this thing a shot. How was it, because I know like the transition of research out of university into the commercial world can be kind of an interesting journey. Was that awkward and trying to convince the right people? That's a good summary of the journey.
Awkward, you mean? Well, no, I wouldn't say it was awkward, but we weren't knowledgeable on IP and all that kind of thing. But I mean, at the end of the day, it was really open source by the university. That was actually really pretty trivial.
No, so that actually formed. That was an interesting conversation also, because it had been implemented and released open source in MATLAB. But, you know, that wasn't actually commercially useful to us. So that was a rewrite job from the start to put it into Python so that we could actually, you know, productionize that.
And then it was really happenstance and things that put a lot of things together for us. We had these generic algorithms. We wanted to find a place to use them. And as a family, actually, there's a hobby farm involved here, which my parents have.
And we happened to have connections with the veterinary college nearby. So we went to them and we said, you know, we need a vertical. We need a specific task that we can hone in on to actually, you know, prove the usefulness of these algorithms and what they can do. And so we were looking at veterinary science and they said, yeah, that's exactly what we need.
We don't have anyone who's actually being able to help us at university do this stuff at the moment. So we launched this whole research effort with them. What was interesting actually as that developed was, and this is a lesson in being an entrepreneur, I guess, is that the core value of the business actually moved sideways from the AI algorithms that we were working with, from the machine vision, and into the actual hardware and robotics that we needed to actually fully automate the process. Because it's all very well having a machine vision algorithm that automates, you know, the skill of looking through a microscope.
But if you don't have a machine that puts the microscope slide on the microscope, essentially, right? I mean, that's some reason simplifying it, but I'm sure you've got the idea. Then, you know, how many samples can you actually run? Like, what's the actual improvement you get to that whole system?
And so actually that was the area that was much harder. Once you have an image on a computer, you're kind of laughing. But getting to that point was actually a little bit more tricky. But yeah, the end goal was actually trying to control parasite burdens in animals, particularly grazing livestock.
But that translates sideways actually into human health, because one statistic is that two billion of the world's population actually has this parasitic worm infection. There's a number of different reasons why you might want to work on this particular problem. And there's a lot of samples to run. There's a lot of samples to run.
Exactly, exactly. You hit it in a nutshell. Well, that's pretty fascinating. And just as a way to close that off, I run an American non-profit charity called the Animal Institute, which brings technology like AI and computer vision and such to solve problems in animal welfare.
So if you ever have any interest in discussing these topics further, I definitely have a playground to play in. Well, absolutely. Sounds like we should definitely go there. I was just thinking while you were talking about it, I mean, the application is definitely interesting and valuable, but I also think it illustrates, I get asked all the time, and maybe you do as well, like, what should I start working on to get into machine learning or get into AI?
What kind of problems should I start looking at? And I think, like, the best thing that you can do is start working in an area where you have some connection or where you're passionate about. So for you, this is kind of a connection between what you studied at university and worked on in research, along with your family and engineering, along with, like, this hobby farm and the connections you had with the veterinary school. So it made a lot of sense to go into that vertical.
So, yeah, that's what I think, you know, people should consider is trying, just try something out that you're passionate about, because those are usually the things that you would stick with long enough to learn and to experiment and to level up. I totally agree with that. I think that's a really good point. Because what you're really saying there is that you will teach yourself better in things where you are motivated, right?
Yeah, yeah, definitely. Not just learning machine learning, but everything. So if you've got that motivation, the more motivation you can summon and put in the one place, then, like, absolutely, you'll double down on it, right? The passion will get you through the hard times, right?
When you're missing all those rows in your data set, right? Yeah, yeah, for sure. Thanks for the extra motivation this week. I was going to say, this has turned into completely a motivational show.
I totally expected it in this area. And we haven't even hit the main stuff we were expecting to talk about. No, there you go. Well, speaking about that, I mean, like, how do you get from robotics and microscope slides to knowledge graphs?
What's that kind of journey like? Yeah, well, unfortunately, I don't have some twisting rollercoaster to tell you. Only that when I wanted to move out of doing the technical work on that project, I was looking around for the next challenge. I suppose one of the things I really like to be is sort of, like, impact-driven in terms of the choice of where I wanted to work.
I wanted to see something where, you know, where you get that value actually disposed. And so you could see that project at the same. Like, you had, like, you could see where you were going to actually make some impact. And looked around at all the roles and had this really great conversation with Michael Brabali, the CEO here at Graken.
And we were a really overexcited conversation when we first met, where he was explaining to me all of the ethos about Graken and the vision that the company has. And I was pretty sold to work here straight off the bat from that conversation. So really just a pivot. His ethos is to take on people that have demonstrated themselves within the scope of what they do, not necessarily that they have to be people who've worked on, you know, knowledge graphs or graphs at all in the past, right?
So he's very open-minded about which field we're coming from. Coming from robotics himself, actually. So there's a bit of a resonance there. Cool.
Well, maybe you could just define... So if I go to, like, the Graken website, which is graken.ai, we'll put it in the show notes, and you talk about a couple of things, which you've already mentioned, and I think it'd be great to kind of dig into those terms a little bit more. So one of the things you mentioned is intelligent systems on the website, and then you just mentioned knowledge graphs. So maybe you could start out by just kind of sharing what Graken means by intelligent systems and what sorts of intelligent systems people are developing out there.
Yeah, absolutely. So the terminology that's being used at the moment is an interesting and kind of hot topic of its own. Naturally, you're going to get a Graken bias spin while you're talking to me, but the general ethos, I think, is better to start with knowledge graph. Okay.
It's good if we also start with how we describe Graken and what that does for people, right? So Graken itself is a database, right? And typically, when you're talking about knowledge graphs, that's what you're talking about. You're talking about some sort of actually large store of knowledge.
Now, knowledge graph itself is essentially totally synonymous with knowledge base, which would be like the mathematically correct terminology that's been abused on the web log for other things. So we tend to go with knowledge graph, it's a little bit sexier, and also immediately gives someone without experience in knowledge base an idea of the shape of the data, which is a graph in a computer science sense. So what do we actually mean by knowledge graph as opposed to just graph? So there's all sorts of different graph types of format all over the place.
But what we're trying to build here is a system which takes you from, you want to make that leap from a graph full of data to a graph full of knowledge. Yeah, I was just going to jump in and say, I think that's maybe the part where I struggle. I think a lot of people have dealt with databases, and maybe some people are familiar with graph structure data, like, oh, I've got this node, which is a person, and another node, which is another person, and they're connected by, I think, the terminology of some edge that is, like, this person is friends with this person, or, you know, something like that. When does, like, a database or graph data go from being just a database to being a knowledge graph?
What's the idea around that? Yeah, so the idea is that the way we build the system up is how do we capture all these different kinds of knowledge, right? And so what we have is we built a knowledge representation system, right? So Gracken itself is actually, everything that's in Gracken is actually built on top of a graph database.
That's actually the start of the innovation. I think that helps people understand what we're doing. So we started, if you start with a clean slate, you know, build a project. We started with a graph database, and then we built other things on top of that, right?
Can you talk a little bit about what the difference, when most people probably think database, they're probably thinking of a relational database, kind of more the classical Postgres and those kind of databases. As you explained here, can you differentiate between what a graph database and a relational database are so that people can, if they're not already familiar, they can kind of make that job? Yeah, exactly. So as we were already talking about, right, like, so we've got a graph in the computer science sense as opposed to in, like, the X, Y plot sense, in that we've got nodes and edges interconnected, right?
So in a typical graph, a node would represent anything, for instance, I like your example, from one node, which is a person, to another node, which is a person who has, like, has friends, as the label of the edge in between those two nodes, right? So what we can do is rather than a relational database forces you to store everything in tables, right? That's what you've got. And is that, well, as soon as we're dealing with data that's more representative of a network, then dealing with it in those kind of tables gets really messy really, really fast.
Because as soon as you've got, like, one thing which is connected to eight other things and eight different file cabinets, and all of those are also connected to eight different things, you know, you get into a big mess with that starting structure. It doesn't scale well there, across, laterally. Exactly. And, well, the idea is that when you're actually trying to build some kind of application with those things, the complexity that you, as the user of the database, has is enormous, right?
Suddenly, you have to try and control this structure that wasn't really designed for the data that you have. So then you go a layer up, and you say, okay, now I need a graph structure to actually more naturally represent my data, right? And so that's why a graph database is kind of born. And when you say kind of more naturally, other than that it reflects the data, the relationships between the data very accurately, is are there any other advantages to going with graph?
If somebody's trying to make that decision today, and they're looking at that, maybe they're looking at Graken, what are the benefits of going graph database versus relational database? I mean, I think you kind of say it in a nutshell, in that the idea is to be able to naturally represent network data as it is. Is it easier to get to the data, though, in that way, and not having to do giant SQL, classical SQL? Exactly, right.
And we go a level more natural again when we actually come to the knowledge graph element that Graken builds on top, right? So once you've got your data in like a graph form, now you want to be able to concisely refer to and search your data and reference what you're looking for, right? So the major innovation, I would say there's two major parts that you need to understand to figure out what Graken is and why it helps you. The first thing is you've got this knowledge representation system, and we have this flexible model.
I don't think we want to talk in technical depth on all of the intricacies of that. You can basically make entities, relations, and attributes. We make these three things, these three kind of characters, right, that you have in the story of building a Graken schema. And the entities are things like people, things like companies, even things like abstract concepts in the world, right?
But then when someone references an entity, you immediately know roughly what they're talking about. Relations are the kind of glue that's sitting between these things, right? So that's what you would use as edges in the graph we were talking about before, right? But relations are probably the most standout concept in terms of what we do, because these relations allow you a huge, huge volume of flexibility.
They say that not only can I have a friendship between two people, right, and say that person A is friends with person B, but I can say that they're also friends with person C, person D, person E. I can do that with one relationship. We used to know that as an edge. So in this case, what we're saying is these relations are hyper edges, right?
And you can see there, so immediately we're starting to introduce like big concepts at a low level of the structure that we define, right? We say basically, we want to upgrade how you can represent your domain. We want to give you this toolbox, which we're calling the schema in Graken, that lets you model your domain. In all of the complexity that it has, right?
And that then means that you've now got this format, this structure that can govern your data, that can look after your data for you. It can make sure that you haven't done anything that's logically invalid. It can make sure that everything is cohesive within your database. So when you start adding facts, right, you now know also what the context of those facts is, because we heavily label all of the elements that go into the graph.
For instance, you can insert a company, a charity, and a university. All of them, all of those types that we've described, that we can describe, have inherited from organization, right? What that now means is that when I want to search my data, I can search for either companies for... charities or for universities and I can search for those individually or I can just ask more generic questions and I can say just tell me about organizations in my data right and so what we're trying to do there is to get this really natural way to actually interact with your data so you're using your own domain terminology to actually access what you're looking for rather than having to say to sort of imagine what are my nodes and what are my edges and how do they fit together right instead we try to bring that to the user and reduce the burden on them when it comes to assessing what's going on in their knowledge graph what is up practically I listeners we're working with Infinite Red to promote their free AI mini course it's called learn more and enroll at learnai.infinite.red this free 5 day mini course is a great introduction to the most important concept types and business applications for AI and machine learning each day of the course includes a lesson a quiz and assignment to submit your learning and after you've completed the course you'll also get a certificate of completion for your LinkedIn profile or for your portfolio if you've been feeling lost in the world of AI and hearing lots of buzzwords then by the end of this mini course you'll be able to speak intelligently about AI and machine learning and their practical business applications again this course is completely free learn more and enroll at learnai.infinite.red again learnai.infinite.red so James I appreciate kind of where the conversation has landed in that there's natural ways of representing your data and that can be modeled well on top of a graph I've tried kind of graph databases in certain scenarios with more or less success and some have been really useful but something I always find is like it seems really hard to build a quote unquote knowledge graph in the sense of kind of developing your schema can be hard because you may know what entities you have but not like might be multiple ways to represent them or you may have just like a bunch of unstructured data and you're not totally sure what entities to choose so like how do you recommend if people are interested in creating this sort of representation of knowledge where should they maybe start thinking about the data that they have and how to develop a schema so that's a really great question I don't have a short answer but essentially that has been a huge part of what I've been doing here at Graken and what we do overall with members of the Graken community where we try and help people to actually understand the principles of what is an entity a relation and actually how do they best fit together and actually what's super interesting about that is that that's a really great meeting of philosophy and technology which I found incredibly interesting and that essentially my thoughts on this is that we now see knowledge engineering and knowledge representation as entire careers that are actually coming around now that you actually have someone who's a specialist an ontologist I've also heard them called the body of knowledge of the best way to do this is not yet set upon and we have our own ways of doing that here at Graken and those ways and how we think that things should be done informs the design decisions that we make in the language that we provide for the knowledge graph so at the moment it's actually been on my to do this a long time to actually write some best practice for knowledge representation and building your schema in Graken we have snippets here and there we have examples here and there and it's very difficult to give really generic guidance but we do have some that we would give out that's a little bit long-winded for here but maybe we can link to that in the future I actually want you to extend that just a little bit I'm kind of curious what can you do with a knowledge graph that you would not be able to do if you didn't have one as you're talking about kind of design and thinking about what best practices are what comes to mind so the main thing that anyone who's interacted with me in a professional context will know is that what I hop on about is trying to get to the point of true to domain modeling right what I really want is to see people building a knowledge graph where they start with a schema where one person who builds a schema could show it to their colleague and their colleague will immediately understand what elements of data are where in the knowledge graph right that makes sense yeah and just to clarify to make it super clear for listeners when you're talking about the schema you're basically like we gave the example before of like person is friend with person so like there's a person type entity in this knowledge graph but there could also be like country type entities or organizations or like different metrics websites resources all sorts of things that's the sort of schema or ontology that you're talking about right the definition of what things are we going to put in our knowledge graph and how are we going to label them is that the best way to think about the schema that is absolutely correct and what I think is also really nice is to make some analogies to OOP so object oriented programming right so anyone who's familiar with OOP and there's a lot of people out there I imagine you have quite a lot of listeners who are familiar with OOP then what we're saying here is we're defining the class we define a class right and those are our schema elements and then we're actually insert data we're inserting like instances or essentially objects of that class and just a quick interjection for those who don't know what OOP is he's talking about object oriented programming it's a technique for representing real world concepts in code as well just keep going I just wanted to let anyone know they didn't do that yeah yeah yeah absolutely so the idea is that all of the elements that we would have as you say we have this schema and you can update that over time but that is the map for your data right that tells you what things are present in our knowledge graph and how can they be connected to one another so for instance we can immediately say in that example where you have a person entity and also an organization entity we can then also define the friendship relation that you talked about right we can say okay a person can be in a friendship with other people that makes sense can be a person be in a friendship with an organization now maybe that's philosophically debatable but I would probably say the answer is no in which case that should not be permitted by your schema you should write a schema so I think maybe there's a bit of a misconception maybe parts of time that I've been thinking about knowledge graphs and maybe other people too where there's kind of this sense that when you hear about oh Google's knowledge graph or something it's just like information is all over the internet and if you create a knowledge graph then you just suck in all that information and then you automatically know a bunch of stuff but there is actually a lot of work in terms of developing a schema that represents the types of things that you're interested in the types of knowledge that you're interested in it's not just like automated thing where you just crawl a bunch of websites and then you have a knowledge graph on a certain subject would that be accurate?
So I wanted to kind of delve into a different area given that we're an AI podcast and so I wanted to ask how is artificial intelligence related to knowledge graphs and are knowledge graphs a source of data that might be available for AI models or is there some other connection there? Yeah I mean where to start so I mean the way we see it is that knowledge graphs are going to be central to the effort towards well intelligent systems as we put earlier so that's our nice way of trying to avoid using AI to make systems more intelligent than they are today we want to empower them with as much as we can and so the idea here is that much of the world is still using relational databases and as we talked about before structurally they present themselves with some challenges where that format isn't natural so instead what we want to do is we want to actually be able to capture the full complexity of the world actually capture all of our knowledge in one place and then be able to present that to for instance learning models for them to learn over it but what we also provide is actually the artificial intelligence of the 80s that is automated reasoning so what we have at Bracken built into the open source core product is an automated reasoner that allows you to infer new data based on the data that you already have and sets of logical rules that you know must be true so this is super interesting because in the day to day we all use our deductive logical skills any number of times and we essentially just don't notice because it's so second nature to us but if you actually try to point to any tools that anyone technical is using right now about the only thing that people have heard of and they did like a week on a uni or something is Prolog that's about the only tool out there for logical programming and it sounds like something computers should be able to do easily like a small set of facts and figuring out a new fact based on a rule just sounds like it else blocks but when you actually try and scale that and make that work and be able to have any number of possible rules that you might want to be able to write and bring that into the database level that's when things start to get a bit interesting there because now we can say when A and B and C are true then D is true and what's nice about this is that your database then whenever you ask for something that fits the bill for D it's going to give you that regardless of whether or not you ever even stored that in the database so I just had it's almost a tangent of a question would you be talking about Prolog and using automated reasoning which was kind of before the days of machine learning as we know it today and I just want to ask is there any tie-in maybe today I know you were saying that you're kind of including that in your approach but today I guess if we were going to tackle that with the current side of technologies we'd probably use things like generative adversarial networks and along with natural language processing to try to create things new from what you already have is there any tie-in to that and just as a random side question is there any similarity maybe in the two well great question so I think our ethos is when you have facts if you can write a rule that definitively tells you that a new fact must be true based on what you have like that's actually fundamental when you can use that then you should use that because why is that true well because firstly it generalizes perfectly right any new set of A, B and C and you know that D will be true and secondly it's explainable that when you see D then you can say well why did I see D and the database can tell you well because A, B and C now what's really interesting and this is a crossover space that's happening right now is as you said how do we see that complementing the other tools that we want to use how do we see that complementing you know any other machine learning approach and so essentially the border for me is to describe it as well either if you were a human approached with a particular problem or your intuition and so essentially what we need is we need to start figuring out okay when do we need to deduce things logically versus when do we need to use a machine learning approach which gives us some kind of intuition based on experience right and so that's actually the center of my work here at Bracken is how do we actually build learners on top of a logical reasoner on top of a knowledge graph in order to like get to the next level of intelligence of our machines right how do we make an intuitive process between those two that ingest new facts that have been learned and then reasons over them or how do we reason over facts and then learn from them right so this is very much an unsolved region and it's super invigorating at the moment to be in that space and what do you think are the sorts of tasks that are kind of low-hanging fruit for learning on top of a knowledge graph for example one thing that comes to mind is question answering sort of tasks or something like that are there other tasks that have been explored in AI maybe in a non-knowledge graph way that you think are particularly relevant to explore on top of a knowledge graph absolutely I mean as I said that's actually kind of the whole remit of the research division here at Bracken is to try and fulfill those end user problems of what are they we're actually a whole blog post on all the kinds of problems that we see there so you're absolutely right question answer systems is I mean that's what that 80s logical reasoning AI systems were all about was building expert systems but it didn't really work because you had to hand code everything well now we can maybe use machine learning to derive some of it automatically and we do question answer systems and you see that with Google's knowledge graph in this sidebar that they have when you type in a search it may just directly find the thing that you're interested in not just links but then besides that we see a lot of applications in for instance well we can talk about knowledge graph completion so that's maybe I want to find new links in between elements of my graph that I'm interested in so for instance if I ingest a lot of biomedical data then maybe I want to try and predict new links between a drug and a disease or infer new treatments or maybe I want to enrich my whole graph before I try and make those as well so I can find other relations interactions between genes proteins etc but then there's other tasks on a totally different spectrum so what about NLP systems and computer vision systems when you apply background knowledge to them whereas humans when we approach understanding a person who says a sentence we have behind us how many years we've been on the planet of experience of hearing people say sentences we often don't really bring that but we also have more than that we also have our knowledge of the world we often hear someone say something and we mishear what they say and what they say sounds ridiculous given our knowledge of the world and so we correct ourselves or we nudge them and say did you just really say that because that doesn't align with my understanding of the world that's what we hope that the knowledge graph can do and we've got a number of conversations with people who want to improve for instance their company's customer service platforms where they know the body of knowledge they know quite a lot about a customer they know a lot about their products and the kind of things that they offer and if a customer says my connection's broken can we immediately infer what they're talking about because we actually know products that their customer has okay they have a home broadband connection with us so they're probably talking about that in machine vision as we've already talked about a little bit from my past then often we just present a learner with a flat image we try and get it to guess what's in the image based on the pixels but you know again if the learner starts to see things that are not sensible in the image or things that go that are often seen together that would be a big help to be able to understand and identify when it might be wildly wrong based on the other things the other context of the problem that it's trying to solve so you started to get into a little bit of the details of where you think certain tasks like computer vision or other things could be augmented by a knowledge graph and it seemed like in some of those cases it was a matter of like okay you have the image and you have this other information that goes along with the image that helps you reason about the image or predict something is that where you see kind of the near term of knowledge graph augmented AI I don't know what the proper term for that is but is that kind of where you see the near term I know that there's also people exploring or doing AI with graph structured data itself rather than just kind of extracting features from the graph as new features in a model but actually using graph structured features or subgraphs or other things in a models are you familiar with that at all or how do you see maybe as a person who says okay well this sounds cool i'd love to try to augment some of my ai systems with knowledge from a graph where might they start looking in terms of methods and next steps right i mean great question so i totally agree what we don't want to do is just stick with the status quo of sort of taking essentially sort of square shape data as inputs to machine learning pipelines right that's like a status quo at the moment right we have our data stored in these filing cabinets and so what we put into our machine learning model what is data that looks like filing cabinets right and what we get out surprise surprise right yeah and i think it's probably confusing to people sometimes it has been for me where like tensorflow talks about a graph right so it's not a graph of the data it's more of a graph of the computation and how it's executed on a certain architecture or the logic of that computation whereas what we're talking about here is actually data that is structured like a graph being processed through one of these systems as a graph would be different than just putting a tensor in right that's absolutely true yeah so that's one of the fundamentals that makes learning over well anything except uh just like a matrix or vector representation difficult is all the frameworks are set up to take those things in and as you say in the case of these pipelines the shape of the processing is a graph but we don't really need to worry about that prepared to the input output as you say what we're here we're saying is what do we do how do we move from these square inputs to something else so that's actually a big body of work that i've been doing over the last year he's been looking at one of the approaches that have been done around that and some of the first approaches which is still quite common is to do for instance walks through the graph like i'm interested in some particular entity in my graph so why don't i start there within my graph and just walk randomly and see what i encounter record my encounter and then maybe i use that as like a row and a vector or something and feed that into my model that's one way of doing it right but you're kind of hoping for some serendipity there you're kind of hoping that i'm going to encounter things in my graph that are important right because i'm just sort of walking around randomly walking is what it is the approach through the graph so okay so the next thing and so from this there was a really nice piece of research came out of stanford they called their paper graph sage or at least their approach was called graph sage and we idea of that was to essentially not just take these one these single walks but to actually look at all of your neighbors take a subset of a random subset of all of your neighbors but then also look at their neighbors and their neighbors and their neighbors and sort of have this more like spider web shape of the graph that you would analyze right and sort of in some way without going into all the technical detail that basically roll that information inwards towards the entity that you're interested in so you kind of gain some information as you move from that like the outer radius of the outer circumference of the circle inwards and that's also really nice right so what that's also doing is still kind of putting your data into a box shape because you're still dealing with a tree now so we've gone from a line which was a walk to a tree and we still didn't find what was really difficult about this so we tried using this but what it doesn't manage to capture say say we are trying to do something really difficult we're trying to find a new new drug for it to treat a disease now if we try and do this if we just look at like generally what does a drug look like and what's nearby to it and also generally what does a disease look like and what's near to it when we then try and match those two things we haven't actually looked at any of the common connections that exist between a drug that drug and that disease specifically like we haven't actually figured out what are the like logically what are the paths that actually connect these things we should probably be interested in those those are probably like the most important features in this graph instead we've just looked at roughly what they look like and then you end up with just like some generic answer like paracetamol treats lots of diseases because lots of diseases significant pain right so what is again more targeted approach and that leads us to no we have to do the hard thing we actually have to learn over a graph shape right we actually have to take in graph data yeah so i'm kind of thinking about natural language processing because that's the world i live in and you know some of what we've learned recently is that you know it's very useful to have your algorithm learn the proper representation of text taking into the context of you know context around just like a single token for example in order to actually learn a good representation of text for a certain task it sounds like what you're saying is it would be useful to do similar things for graphs in that we need to learn how to represent graph structure data in a neural network because it might not be like if we just take all the nearest neighbors and put them in kind of standard row structure and use that as a representation then we might miss that actually the predictive thing is beyond the nearest neighbors right in like a bunch of links away even though it's not a nearest neighbor that's like the thing that's indicative of the thing that we're trying to predict is that kind of a long right track absolutely why i see you describing that in nlp is definitely what we're aiming for here right and not just in graphs but i think the industry in general is where we're now seeing like beyond curve fitting it's called right and like how do we move beyond where we are right now to a point where the machine is actually understanding it actually learns to understand what's going on so we already talked about that with like nlp based on a knowledge graph it understands the context you were just talking about that there in context in a machine vision problem also understanding the context of what's actually in the image all of these things mean that the learner can not just sort of learn by rote or learn by exact examples but can actually understand what's going on what's really interesting in graph is that you have exactly that you might have like one particular feature that you find like if i see some particular thing that's in some particular way related to what i'm interested in that's a huge indicator but you might also just see a general structure that occurs that when the you know i have these five elements these five entities all connected together in a particular way they all have particular types that is a very typical structure for a really effective drug right but those combinations come up again and again but in a generic sense maybe we want to learn that we want to learn some kind of structure so then what we were faced with was we were faced with the problem of okay no we actually need to learn over graphs and to our luck we're not you know we don't have the budget to do like and the manpower to do these huge research efforts ourselves but our neighbors over here in london deep mind released a paper last year and they also released a library to support what they were doing where they generified a lot of the concepts of graph learning and how to do learning over graphs in this really neat way given they were acquired by google it makes sense that they also figured out how to do this in tensorflow so what they got there is a pipeline that now actually lets you input a graph into tensorflow as the data and get that same graph back out as an output but with updates made to every element of that graph so that means that essentially what we can use we can use that as a little toolbox that allows us to perform any number of different tasks over a graph structure obviously we've traded that here at grackens to work over the knowledge graph but what we can do is we can just carefully frame the kind of problem that we have so that this toolbox can help us to solve that and is that the graph nets library that's exactly the one yeah that's the one okay yeah we'll definitely link that in the show notes as well because it seems like they have a good you know usage example and notebooks and such that people can play with that so you totally won me over and i'm looking forward to jumping in and playing with this and i know daniel is too can you start walking us through what it is like to actually build a knowledge graph with gracken and you know what do you need what languages do you need to know and also i noticed on the website you talk about is it grackql am i pronouncing that right so that's grackle that's grackle i'm sorry no no no no no worries so yeah so yeah i can give you the whole overview of what you would do right fantastic actually close down what we were talking about just there the whole learning approach that we've been building and all the research that we do on top of knowledge graphs right i'll emphasize that is all we release all of that as code available by our github specifically we have a library called kglib so that's our knowledge graph library for machine learning so kglib is the center of those projects and the main one that we're running right now is knowledge graph convolutional networks so that's how we apply these learners on top of both the reason and the knowledge graph shapes data the starting point is how do you actually get a knowledge graph right how do i actually get a knowledge graph together now the components that you have there as you pointed out is something we should start with so you have gracken itself right so gracken core is release open source you can download that from github or install it with the package manager and that's a database which is going to run you can install that on your local machine and get up and running or put it in the cloud and so you need that back-end service running now when it comes to actually accessing that we have three officially supported drivers at the moment we have python node.js and java so we make sure that all of those are up to date working with the latest gracken what's really interesting there actually is the communication protocol between those clients and gracken is called grpc so that's something from google google's remote procedural call that has replaced using rest services so what's really nice about this an actual end goal that that gets you to is it means that when i'm accessing the database in python with python i get to actually use native python functions all i have to do is import the package that talks to gracken import the gracken client in python at the top of my script right and then i can just instantiate a communicator that will talk to gracken and make queries to the database just out of my native python so i can just launch that straight from my application it doesn't feel like you're talking to a database anymore right it just feels like you're making function calls which comes back with information that's pertinent to your knowledge graph that's great and would you use that so would you use that client tool to help you build your knowledge base like let's say that i have a bunch of text data and i'm like pulling entities out of it that i or classifying that in a certain way to store it as a certain type of entity would i kind of be doing that in python and then push that to gracken via the python library or their like bulk upload techniques or like ways to get data let's say from relational to graph what's the sort of range of what people do yeah absolutely great question so basically you're absolutely on the money the idea is that we give the users these clients in their native language because that's their strength right we already know that they know how to speak that and they get all of the freedom that that language offers and then the way you're actually interacting with gracken is through gracken's query language grapple right you can probably see where that name comes from right so they've got this query language for grapple and the idea is that that's this really concise really expressive language but then what you would do is that is your one-stop shop for how you actually talk to the knowledge graph in terms of what your intentions so if i want to either retrieve something then i make what we call a match query if i want to insert something i use an insert query and if i want to wherever i see a particular pattern insert something that's a match insert i'm sure you get the idea right so you have all of these different ways you can read and write in the database and you do all of them in the same way for your application you just you know you call ask the client you say dot query right and make this query and then the response you get back will be the answer right either you insert something or read now then we've got we've got a number of we've got a repository of examples so people can have a look on there you know very typically people are as you say they're migrating from either sql data or from csv data in which case it's a matter of just writing what we call an etl pipeline so something that will just traverse over all of that data that you have and make the appropriate queries in grackle to get that data shifted over into gracken itself now one of the questions that people ask me really often and definitely comes in our community slack quite often is can i like automatically build my knowledge graph so we kind of talked about that a bit earlier in the call the problem is that like it's possible to automatically adjust a relational database into a knowledge graph but the problem is you just end up with the same structure that you had in your relational database but in the knowledge graph you know you still end up with something broken because you need to apply that human understanding that you have of the data that you have in these table formats you need to say what's that actually mean what does my domain look like so what you do is you first well it's an iterative process of course like a lot of engineering but you're gonna start out by saying here's my schema here's what i think my domain looks like okay now what i'm making right go over this file what parts of that schema can i infer from the particular row i'm dealing with right now so i guess if somebody wants to get into this i know we're both very excited about it and i've learned a lot that i didn't know before the conversation where can they go and learn more and actually start digging into using gracken themselves and grackle any specific links that you want to recommend well we have the docs available on our website people seem to think those are quite fun and we also have there's also some examples there once for instance how to do data migration into gracken so you've got to get that knowledge graph up and started so you've got something to play with we then have an examples repository on our github and also for those who really like to jump into the deep end then the kglib repo is quite a good place to if you want to see immediately from the top how you're going to do the machine learning over it and then i suppose the other thing to majorly encourage is to check out our blog so that's blog.gracken.ai so we have a lot of stuff there that will give people an idea and give them a flavor of what you can achieve with the knowledge graph and how succinct it can be to get you motivated to actually move your data over and give it a try well james thank you very very much uh for for coming on the show and just kind of schooling us and all this has been really fascinating and we appreciate it so thank you and we'll talk to you soon thank you very much for having me uh all right thank you for tuning into this episode of practically i if you enjoyed the show do us a favor go on itunes give us a rating go on your podcast app and favorite it if you are on twitter or social network share link with a friend whatever you gotta do share the show with a friend if you enjoyed it and bandwidth for changelog is provided by fastly learn more at fastly.com and we catch our airs before our users do here at changelog because of rollbar check them out at rollbar.com slash changelog and we're hosted on leno cloud servers head at leno.com slash changelog check them out support the show this episode is hosted by daniel whitnack chris benson the music is by breakmaster cylinder and you can find more shows just like this at changelog.com when you go there pop in your email address get our weekly email keeping you up to date with the news and podcast developers in your inbox every single week thanks for tuning in we'll see you next week