PODCAST · technology
Knowledge Graph Insights
by Larry Swanson
Interviews with experts on semantic technology, ontology design and engineering, linked data, and the semantic web.
-
45
Daniel Davis: Grounding Generative AI with Context Graphs – Episode 49
Daniel Davis Long before Foundation Capital published their "trillion dollar opportunity" article about them, Daniel Davis had been building a platform for context graphs. Daniel's work in complex domains like aircraft safety and autonomous vehicles, as well as his study of quantum mechanics, gave him insights that led him to explore ways to ground probabilistic AI systems in the logic and knowledge they'd need to deliver trustworthy information. He settled on context graphs as the best way to accomplish this. Daniel was introduced to knowledge graphs by his co-founder Mark Adams, and he has immediately become an RDF evangelist, aiming to not only proselytize the tech but to also make Mark's cat Fred famous in the process. We talked about: his role as co-founder at TrustGraph his work to make his co-founder Mark Adam's cat Fred famous his diverse background in defense, autonomous vehicles, and cybersecurity how the complexity and vast scope of compliance requirements around autonomous vehicles led to his interest in context graphs how the arrival of ChatGPT and GPT-3, and his knowledge that probabilistic systems wouldn't be up to the task of delivering legally compliant information, served as a catalyst for his current work how a friend's article about the Foundation Capital "trillion dollor opportunity" post led to his Context Graph Manifesto his hypothesis, based on conversations with several friends at big consultancies, that the sudden interest in context graphs arose from executives reviewing their many failed 2025 AI proofs of concept his definition of a context graph: "a graph structure that is optimized for AI usage" the influence of his friend Vicky Froyen's 2019 presentation on context graphs at the first Knowledge Graph Conference the three elements he sees in a context graph - decision traces, provenance and explainability, and feedback - and the power of combining them in a single graph system their use of ontologies like PROV-O the importance of a context capability in complex domains like military airworthiness how his background in quantum mechanics and mathematics led to his awareness of the limitations LLMs from their introduction how he balances the probabilistic nature of the universe with the needs of practical applications that entail legal obligations his surprise at the lack of attention that a lawsuit between Amazon and Perplexity is getting, given its huge implications for AI agent systems their goal at TrustGraph of making graph technology and ontology design easier and more accessible a cliffhanger about the implications of LLMs not understanding time Daniel's bio From military aerospace, space-to-air-to-sea mesh networks, autonomous vehicles, and enterprise infrastructure, Daniel has made of career of making the most complex systems work together. Whether it's cyberphysical systems or data, interoperability and guaranteed performance have always been top priorities with a mission-first mindset. Co-Founding TrustGraph represents a multi-decade quest to improve decision making through access to better knowledge. Connect with Daniel online LinkedIn X Resources mentioned in the interview TrustGraph.ai TrustGraphAI YouTube channel Context Graph Manifesto Collibra's Context Graph, Vicky Froyen's 2019 Knowledge Graph Conference presentation Video Here’s the video version of our conversation: https://youtu.be/npjErvR7oXY Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 49. When Foundation Capital published their article about the trillion-dollar opportunity presented by context graphs, many people were hearing about the concept for the first time. Not Daniel Davis. He's been developing an open-source context graph platform since 2023. His work in complex domains like aircraft safety and autonomous vehicles, as well as his study of quantum mechanics, have led him to explore ways to ground probabilistic AI systems in logic and knowledge. Interview transcript Larry: Here we go. Hi everyone. Welcome to episode number 49 of The Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Daniel Davis. Daniel's the co-founder and co-creator at TrustGraph, which is an open-source software project that builds graph stuff that we'll talk about today, based in San Francisco. And welcome to the show, Daniel. Tell the folks a little bit more about what you're up to these days. Daniel: Oh, wow, Larry. That's a lot to unpack there. I mean, how much time do you have? Yes, I am the co-creator of TrustGraph with Mark Adams, who is a bit more well-known in the graph community than me, but he likes building graphs. He doesn't like talking about them so much. And I'm confident that he would agree with me on that. Although I am trying to make his cat Fred famous, because I'm actually working on a new video on our guide to understanding RDF, which is something that a lot of people have asked us about, and how Mark taught me RDF so many years ago with three simple sentences about his cat Fred. But TrustGraph is what we've been working on for the past few years now. And we've had a couple of different ways of trying to explain it to people, whether it's a context operating system, context development platform. Daniel: Some might even think of it like a context science platform, which I think is kind of an interesting analogy as well. But I myself have quite a diverse background, spent a lot of time in DOD aerospace, came out to Silicon Valley almost 10 years ago to work on the autonomous vehicle industry, focusing on cybersecurity and safety. And that's why I write articles about things like determinism and information risk and trying to attribute value to information. But in that world, I also was doing complex knowledge work where you read one document that's 800 pages long, and then you have to read a statement that references another document, or maybe it references 12 other documents, and you just keep tracing down this chain of references, and then you have to understand which one of these documents actually takes precedence. Why did these statements conflict with each other? Daniel: Do they conflict with each other? How do I try to come to some sort of opinion about this? And in the safety critical world, opinions aren't allowed. It's not like auditing for enterprises that you can have opinions. They take a much grimmer view on that. And that's where that word determinism comes in and whether determinism means what people think it means. And how is that for an introduction? Larry: Well, it's perfect, because it sets up all the things we want to talk about. The first thing I want to talk about, I think, well, it's so hard to choose, but the reason you came to my attention is, I forget, somebody ... Oh, my friend Jochen in Munich brought you to my attention. And I was like, "Whoa, this guy's been talking about context well before December of 2025," which is when apparently the rest of the world started thinking about context and context graphs. Tell me a little bit about maybe the story of your connecting with Vicky or however. I mean, that combination, we were talking before we went on the air about your experience with autonomous vehicles, discovering Vicky and his interest in context graphs. And then a lot of what you just said is a reason to need not the context graphs to do the stuff you want to do. So, maybe talk a little bit about your journey into the context realm. Daniel: Well, so much of this comes from the problem I was trying to solve in the autonomous vehicle world. This is work that I've been doing for years in DOD aerospace with risk management and cybersecurity and safety, and just running complex programs. It's so much about the paperwork and how you make decisions, how you justify those decisions, how you comply with regulations, understanding the regulations. And for autonomous vehicles the scope was just unprecedented when you look at the number of things that could go wrong. And we could literally talk for the next few days, just me rattling off scenarios, and you'll go like, "Wow, I never thought of that. I never thought of that. Wow, wow." And you just start going like, "How do you manage this?" And well, that was what I was sought out. That's what I was having to solve. And looking at all the different ways of doing this and trying to combine a Bayesian approach with risk management and realizing the data sets were going to be huge and how do you manage that. Daniel: And it kind of turned out to be an unsolvable problem at the time. And around that time, because I was working at Lyft, I got brought up to manage a lot of the issues that were going on with the Lyft actual IPO, which again, more regulatory stuff with the SEC and how processes are applied across the entire enterprise, how these comply with SEC regulations and expectations and how this was audited. And just even how we were measuring our cybersecurity performance as a company, how that was getting reported to the board. Again, very similar problem, just slightly different problem space, slightly smaller scope. And around that time Mark's company, Trust Networks, was actually acquired by Lyft, and I met him and I got introduced more to graphs and knowledge graphs. I actually hadn't even worked with knowledge graphs prior to that. I was much more in deterministic structures and DOD aerospace. Daniel: I was the one always saying, "Why are we writing in this Python? We should write it all in Ada." And all the people would just look at me and go, "What is Ada?" And I would do that just as a joke, but also partially believing it. I still advocate Ada. I like Ada, even if it makes developers cry. It was designed to make developers cry, because it always works, but that's another story. And that was back in what, 2018, 2019?...
-
44
Veronika Heimsbakk: Connecting Data Engineering and Knowledge Architecture – Episode 48
Veronika Heimsbakk With interest in knowledge graphs growing by the day, Veronika Heimsbakk is busier than ever with her efforts to connect the data engineering, information architecture, and ontology practices that drive modern knowledge engineering. Best known as an advanced knowledge graph practitioner and a leading expert on the SHACL standard, Veronika also regularly shares her knowledge through her writing, university courses, and professional workshops. We talked about: her work at Data Treehouse, creating tooling for data people to get on board the knowledge graph journey how she helps data engineers find their overlap with knowledge engineering her work to build bridges between data engineers, information architects, and ontologists how she meets data engineers on their own turf by using simple Python scripts to put their data frames into a knowledge graph how public sector compliance requirements drive demand for RDF solutions the powerful tool that helps her communicate with a variety of stakeholders and collaborators: coloring pencils how she works with information architects and enterprise architects her take on graph visualizations, that they're rarely very useful in helping her communicate with engineers and business people her approach to balance top-down ontological approaches and bottom up data engineering approaches in knowledge graph construction her early work with SHACL and her appreciation for its applicability to a wide range of use cases beyond simple data validation her take on the ongoing OWL versus SHACL discussion her preferred tool for turning modeling sketches into RDF code: WebProtégé how her work with the Norwegian maritime authorities reduced caseworker time on regulatory tasks from several weeks to a few seconds her upcoming masterclass at the Knowledge Graph Conference on transitioning from data engineering to knowledge engineering Veronika's bio Veronika Heimsbakk is a knowledge graph specialist at Data Treehouse with over a decade of experience in semantic knowledge graph technologies. Throughout her career as a consultant, she has served as a developer, architect, advisor, and team lead, working with public and private sector clients across Europe, with a strong focus on the public sector in recent years. Veronika is the author of SHACL for the Practitioner (2025). She is a regular guest lecturer on SHACL at the University of Oslo and has delivered the SHACL Masterclass at various venues for several years. In 2024, she was recognised as one of Norway's Top 50 Women in Tech. On Substack, Veronika writes From Data Engineering to Knowledge Engineering, a practical article series that shows data engineers how to build knowledge graphs using familiar tools like Python, Polars, and maplib, covering everything from ontologies and SPARQL to SHACL validation and reasoning. An eager advocate for logic and linked data, she champions knowledge graphs in a landscape increasingly dominated by predictive approaches. Connect with Veronika online LinkedIn Substack SHACL for the Practitioner book e-mail: sh at veronahe dot no Video Here’s the video version of our conversation: https://youtu.be/cY8rhPoXepE Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 48. Ontology design and knowledge graph building are truly team sports, requiring collaboration across a variety of business and engineering disciplines. Few practitioners are as experienced at bringing these teams together as Veronika Heimsbakk. As both a consultant and as an author and educator, she helps business and public sector stakeholders, data engineers, and knowledge architects understand each other's languages and appreciate each other's practices. Interview transcript Larry: Hi everyone. Welcome to episode number 48 of the Knowledge Graph Insights podcast. I am extremely delighted today to welcome to the show Veronika Heimsbakk. If you've ever been to the Knowledge Graph Conference, Veronika's just, you know her already. She's just the most engaging presence there. She's always got her Norwegian KitKat bars and her Polaroid camera and doing awesome workshops on SHACL and other things. But welcome to the show, Veronika. Tell folks a little bit more about what you're up to these days. Veronika: Thank you, Larry, and thank you for having me. Yes, these days I'm up to in using familiar tooling to get started with knowledge graphs and harvesting all the knowledge graph capabilities and graph traversals as opposed to JOINs and tabular things. Yeah. Larry: Well, this feels like a year in which a lot of that might be happening. A lot of data engineers, there just seems to be so much excitement and interest in knowledge graphs and ontologies. And it's so important to meet people where they are on their journey into that. And you know, you're involved with, I know the data folks in Helsinki and we didn't talk about your background. You're currently a knowledge graph specialist at the Data Treehouse. And previously, you've done consulting like at Capgemini. So you've done a lot of this work hands-on. You wrote a book about SHACL, and you do workshops and a lot of teaching. And part of that whole mindset of yours is currently, maybe not... I guess it's focused on helping data engineers become knowledge engineers. Is that an accurate way of putting it? Veronika: Or at least not fully transitioning maybe from data engineering to knowledge engineering, but finding that intersection of a skillset that's truly powerful in working with ontologies because we have seen the rapid interest and popularity of ontologies lately when large language models took the world by storm. But I've also experienced during my years as a consultant that the ontology things and the knowledge graph aspects, they are usually a concern of the information architects and those who work with concepts and terms and setting them into context and everything. But the information architecture departments usually don't talk to the people working on the data and making applications. So why should we create ontologies that are machine-readable in semantic models? They are a database schema in itself. They are fully usable by data people, but there is something in between there that's hard to grasp. Veronika: So I want to build this bridge because when I was finished at the uni, I started as a Java developer on Symantec Tech project. So I've been doing a little bit of data engineering myself in the early days going from tabular data to RDF and knowledge graphs. But I see that this isn't something that should be separated, of course, if you want to be data-driven, ontology-driven in your applications, you need the data people on board if you're going... Successful project. Larry: Yeah, that's really interesting too, because it seems like there's at least a couple of things there. Just the common language between information architects, data engineers, and knowledge engineers, but then also, in any communication project, meeting them on their own ground. And that probably applies both in the human natural language that you're talking to people about, but also in the technology to implement stuff. And I know that's what you're doing in your day job now, but can you talk a little bit about how you're making knowledge graphs and knowledge engineering more accessible to data engineers? Veronika: Yes, of course. The company that I work for, we create a framework for doing exactly that, like working with knowledge graphs using data frames. So I've been working a lot with that lately and writing a lot of articles on the topic and how you can transition from a tabular data format to queryable knowledge graph, doing graph traversals and answering questions you even didn't know you had, right? But the way that I work is usually together with clients, is applying simple tooling on their tabular data. And these days, most people work in data frames, right. So going from a Polars data frame to queryable knowledge graphs only require three, four lines of Python code by using, for example, maplib, which is a Python framework for handling knowledge graphs as data frames. And you can even get your SPARQL query answers back as a data frame to push further in your data pipeline. Veronika: So you have all these capabilities of graph traversal in answering questions, but also, in inference and enrichment and automating enrichment of completing metadata, for example, and doing validation with SHACL, for example. You have all these knowledge graph capabilities that you can put on top of your existing data infrastructure. Larry: Are there classic use cases where... Is there higher demand in some industry verticals for this kind of thing? Veronika: Recently, in Norway at least, I've seen a rapid demand for like, "Hey, I have all my data in this data lake," like Databricks or Snowflake or whatever. But the information architecture folks, they're building ontologies or they want to reuse the national standards. Like in Norway, we have a set of national standards that are expressed in RDF. It's SKOS for concepts and terms. It's DCAT for data catalogs and it's CPSV for core public services and to be able to describe them. And it's a demand for the public sector to comply to those. And when they have data in Databricks, for example, how can we connect to these national standards or to our internal ontologies with the data in Databricks to make the ontologies operational? Veronika: So that's a use case that I stumble across a lot lately. And I've actually written about this recently because I did a teeny tiny project on that at the Culture Heritage Directorate in Norway. And that again, it's like four lines of Python inside Databricks and you have your ontology operational on your data. Larry: Interesting....
-
43
Joe Reis: Fighting “Context” and Other Tech-Industry Hype – Episode 47
Joe Reis When Gartner declared 2026 "The Year of Context," Joe Reis leapt into action, immediately writing a good-natured satirical article about "context products," "context lakes," and the "analyst singularity." It's a fun article that exemplifies Joe's no-nonsense approach to industry education and concludes with a serious point — "context does matter, and most organizations are terrible at it." We talked about: his forthcoming data modeling book, "Mixed Model Arts" the origins his satirical post "Gartner declares 2026 the year of context" our speculation on how the word "context" came to the fore how his decades of experience help him fine-tune his hype detectors "the one equals 10 dilemma" via which leaders extrapolate AI benefits that senior programmers gain onto less-skilled engineers the challenges that executives miss of building a semantic layer the endless quest for "silver bullets" over solving fundamental business problems the relevance of Einstein's definition of stupidity in the AI hype cycle how the big AI providers are like the ISPs of the 1990s how generative AI has accelerated and improved his workflows the trepidation around AI that he feels when he visits Silicon Valley and San Francisco the unprecedented pace and scale and context of the current AI hype cycle the role of the knowledge community in the current tech environment Joe's bio Joe Reis, a "recovering data scientist" with 20 years in the data industry, is the co-author of the best-selling O'Reilly book, "Fundamentals of Data Engineering." He’s also the instructor for the wildly popular Data Engineering Professional Certificate on Coursera, in partnership with DeepLearning.ai and AWS. Joe’s extensive experience encompasses data engineering, data architecture, machine learning, and more. He regularly keynotes major data conferences globally, advises and invests in innovative data product companies, writes at Practical Data Modeling and his personal blog and hosts the popular data podcast "The Joe Reis Show." In his free time, Joe is dedicated to writing new books and articles and thinking of ways to advance the data industry. Connect with Joe online JoeReis.xyz Joe's writing and podcast Gartner Declares 2026 The Year of Context™: Everything You Know Is Now a Context Product Fundamentals of Data Engineering (O'Reilly), Joe's bestselling book Practical Data Modeling Personal Blog The Joe Reis Show Video Here’s the video version of our conversation: https://www.youtube.com/watch?v=6A_FWL0hbKM Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 47. When Gartner recently declared 2026 "The Year of Context," the gauges on Joe Reis' industry hype dashboard maxed out. Joe's a respected veteran of the data profession, known for his best-selling book, Fundamentals of Data Engineering, and for his courses, newsletters, conference keynotes — and especially for his no-nonsense takes on industry trends. He's also a good friend of the knowledge graph community. "Context" is just his latest tech-industry hype take-down. Interview transcript Larry: Hi, everyone. Welcome to episode number 47 of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show Joe Reis. Joe is a well-known figure in the data engineering and data world. He's the co-author of the book, Fundamentals of Data Engineering, which is kind of a category-setting book. He's working on a new book called Mixed Model Arts, on data modeling, and does a lot of other interesting stuff. He's really well known in the conference community. And anyhow, welcome to the show, Joe. Tell the folks a little bit more about what you're up to these days. Joe: Hey, what's up, Larry? What have I been up to lately? Just been editing Mixed Model Arts. I just actually finished, I guess, the main edits and just down to the very minor tweaks as of today. So that's awesome. So literally just working on that before we hopped on and I'll be working on that after we're done. Larry: Okay, Great. Well, sorry to interrupt your book. I'm a former book editor, so I always feel bad when I interrupt progress like that. Congrats. Joe: It's okay. Thank you. Larry: Do you have a publisher for the book? Joe: That would be yours truly, yes. Larry: All right. Okay. Well, anyhow, we'll keep the webpage- Joe: We'll talk about that later. Yep. Larry: Yeah, with info about where to get it. Well, hey, the reason this conversation came together, there was this great little convergence of meeting of ideas a couple of weeks ago. I had just done a presentation where I was talking about how hyped the AI cycle is. And then in quick succession, I saw a post from Juan Sequeda where he talked about some folks have mixed feelings about Gartner. And then I came across this post you had done, "Gartner declares 2026, the year of context." It was this brilliant satirical piece. Can you talk a little bit about that and what motivated it and just maybe a quick outline for folks? Joe: Yeah, I mean, I think spawned from... I guess my social media circles were like Gartner, and all of a sudden I started seeing my LinkedIn feed bombarded with the word context and how Gartner declares this the year of context and... I can swear in your show, right? Larry: Yeah. Joe: Okay, shit. Larry: It's fairly family friendly, but yeah. Joe: Yeah, it's all good. So I've seen them and similar research firms in the past declare this, that, or the other thing. And I just felt like this in particular seemed... And no offense to the knowledge graph folks there, whatever, you're all great. And I think it serves knowledge graph community really well, but the year of context I think is jumping in the gun a bit too fast. Where last year was a year of agents, year before that was year of AI or whatever, and it just seems like... It's what I described as the buzzword industrial complex where we jump... Not we, but certain groups in the industry need something new to push onto people in order to keep, I guess, discussions going, in order to keep people attending conferences, in order to keep selling consulting services and all this other stuff. Joe: And so I felt like this was really just another instance of it, but I decided that I had had a few spare cycles in between editing my book. So I was like, "Oh, let's just write a satirical piece on this," maybe somewhat satirical, maybe just kind of poking fun at just, I guess, the nonsense of the industry that we keep finding ourselves in over and over again. So that was all there was to it, Larry. Larry: Okay. Well, one of the ways you contextualize that was this, I forget what you call it, the conference content capital cycle, this self-reinforcing loop, which appeared to me to mirror this kind of whatever that bizarre financial loop that's keeping the AI companies up. Was that intentional or was I just reading into that? Joe: I mean, I don't know if it was intentional, but it's just an observation that I've noticed in that article, and I think a few others, where it was very much... It's a self-sustaining thing where you need the news story, you need this. And it's the same as the AI hype cycle right now where it's just a very circular system. And so just that the money just sort of rotates around and that's just kind of how it is amongst strangely a lot of the same players, which I think is kind of funny. Larry: Interesting. Yeah, so maybe we've just stumbled upon some universal dynamic that drives various kinds of hype cycles. But one thing that occurred to me is there's always some fundamental underlying, it's business anxiety or truth or something like that that's driving these things. The context thing, do you have any hunch where that came from? I remember it just hit my LinkedIn feed, what, three or four months ago and it's been constant ever since. Joe: I'll ask you this actually. I mean, let me reverse the roles of a host and guest here. I mean, you've been in the knowledge space for a while and I imagine that some manifestation of the word context has come up in your discussions with your peers. So I guess if I'm in your shoes and those of your peers, what's it like to see a word like context or semantics or ontology or graphs becoming these sort of terms du jour? Larry: Well, in one sense, it's really gratifying, of course, because we're on the radar screen. You can actually say ontology in public now, which has not been the case for the last 10 years. Joe: Yeah, you get jailed for doing that. Yeah. Larry: Exactly, yeah. Put you in the stocks in the middle of the courtyard. But no, so it's really interesting. And that's one of the reasons I'm curious about your take on it, because it's like there's these real things that drive it. But in terms specifically of context, I was just reminded just of... Somebody on LinkedIn today just shared a post I did recently about Dave McComb's... I don't want to get too nerdy, but this is a Knowledge Graph Insights podcast, so I'll set a little context. There's this thing in knowledge graph construction. You have the A box, the assertion box, which is like all the things, all the data instances that are in there. Then you have above that, you have the T box, which is the concepts that describe it, the ontology basically, typically. Dave McComb, who I think you must know, because the data centric enterprise and all that. Joe: Mm-hmm. Larry: He articulated this notion, I don't know, a couple of years ago of the CBox. And what was really interesting in this post I saw today is that he used it as the categorization box. That's where you put all the taxonomic terms, vocabularies, all that sort of what I think of as the metadata about the data is sort of in there. And I didn't realize at the time,...
-
42
Robert Sanderson: Building Yale’s Cultural Heritage Knowledge Graph – Episode 46
Robert Sanderson Yale University manages huge collections of precious cultural heritage artifacts housed in multiple museums, libraries, and other collections. Using knowledge graph and ontology engineering design patterns that he has developed over his career, Robert Sanderson helps scholars, researchers, and the general public access information about — and make connections across — millions of unique items in Yale's collections We talked about: his work as Senior Director for Digital Cultural Heritage at Yale University the knowledge graph and ontology engineering design patterns that guide his work the scope of his work — improving discoverability of Yale's extensive collections of artifacts, facilitating the management of collection information, and even collecting data on physical artifact storage facilities how their linked data approach lets researchers easily connect information about artifacts and information housed in multiple museums, libraries, and collections how the growth of LLMs has affected their KG user interfaces how AI is accelerating their ability to add to their knowledge graph the millions of artifacts in their collections that aren't yet accounted for the compact nature of their three-billion-triple KG ontology, just 10 classes and 50 relationships the extensive vocabularies and taxonomies they use how they handle the need to reconcile the identity of lesser-known people who don't have a Wikipedia page or other authoritative references available how they balance the competing needs of comprehensiveness and usability as they build their knowledge graph how knowledge graphs facilitate discoveries that other search tools can't current opportunities for post-docs to join his team to work on leading-edge AI projects Robert's bio Dr. Robert Sanderson is the Senior Director for Digital Cultural Heritage at Yale University, where he works with the libraries, archives, and museums to ensure that data and other digital efforts are coherent and connected. He is the principal architect for Yale’s cross-collection discovery system, LUX, which is built on the Linked Art specifications, for which he is an editor. He is also an editor for the IIIF specifications, was the co-chair and editor for JSON-LD and the Web Annotation data model in the W3C. He has previously worked at the Getty in Los Angeles, Stanford University, Los Alamos National Laboratory, and the University of Liverpool. His current areas of work and research are at the intersections of cultural heritage, knowledge graphs, data usability, and generative AI. Connect with Rob online LinkedIn email: robert dot sanderson at yale dot edu Rob's LinkedIn post series on KG and ontology design patterns The 10 Design Principles to Live By Ontology Design Patterns Naming Things Avoiding Reification Foundational Ontologies Multiple Inheritance, Not Multiple Instantiation Predicate Reuse... Meh Document your ABCs Separate Query and Description Semantics Usable vs Complete acknowledgements Video Here’s the video version of our conversation: https://youtu.be/SMAVyrL3aSU Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 46. When your job is to help scholars and the public discover information about millions of cultural heritage artifacts that are housed in multiple museums, libraries, and other collections, you need a powerful — but also manageable — knowledge graph. That's Rob Sanderson's role at Yale University. He and his team apply time-tested ontology and knowledge engineering design patterns to help people discover — and see the connections between — these precious human artifacts. Interview transcript Larry: Hi everyone. Welcome to episode number 46 of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show Robert Sanderson. Rob is a professor and the senior director of Digital Cultural Heritage at Yale University, the Ivy League School in Connecticut. Welcome to the show, Rob. Tell the folks a little bit more about what you're up to these days. Rob: Hi, Larry. Thank you so much for inviting me to be part of the illustrious lineup of guests on your podcast. So yeah, I'm Rob Sanderson, as you said, Senior Director for Digital Cultural Heritage at Yale. So I work with the libraries, the archives, and the museums and other collecting organizations at Yale to help them to be more connected with linked data organizationally and more coherent in the way that we do things digitally. So our projects really focus on discovery and access to the collections in service of the university mission, which of course is teaching and learning, research, and preparing our students to be the next generation of leaders in the world. Rob: So for that, the university invests very heavily in the collections, which is fantastic. We are super proud of the 300 years of collecting that we've done. But we want to make sure that if you can't come to New Haven, you still have as good access to those collections as possible. And the ability to find amongst the many millions of objects that we steward exactly what it is that you need. So a lot of our projects focus on describing the collections in a more computationally tractable way so that that discovery can be better. And also how to manage the information that's associated with the collection, but isn't a museum object or a archival object itself. For example, I have two postdocs that are openly available. So if you are a few years out of your PhD or just about to graduate, do get in touch to work on how to use AI to extract the ownership history or the provenance of particular museum objects from the archival content that we also manage. Equally, how can we align research data sets with the collections? So we also have a natural history museum as well as two art museums. How can we align the environmental datasets that are out there on the web with the natural history specimens that could have been impacted by those environments? Rob: Yeah. And then equally, we look at the environment of Yale. So we have a large project at the moment to set up environmental monitoring with sensors for light, for humidity, temperature, and so on, to be able to generate a large data warehouse aligned with linked data with the collections so that we can have evidence of what the effects of the environment are on the collection items themselves. Larry: Interesting. That is so fascinating. What a fascinating remit. One quick thing about what you just said. Is that about humidity and temperature and all the things that might affect the endurance of these physical artifacts? Rob: Yep. Yes. That's right. Larry: Yeah. Rob: We have about 200 sensors around the place monitoring every five minutes a new data point, which if you think about it, it's actually not that much data. Larry: Yeah. I have to say, I just love that you're doing data stuff along with it. That you're not just sitting in a dusty old room collecting things. You're doing cool modern stuff too. But hey, I want to quickly interject how we met, and I just want to put this in because we won't have time to talk about it today, but I want people to know about this fantastic series you did. That's how we met was somebody drew to my attention the series you've done on ontology design and on knowledge engineering design patterns. And I'll point to that in the show notes, but I just wanted to mention. And the more I think about what you just said, because I didn't know all of this background before we started recording, I'm like, "Oh, this is even better than I thought." So I'll point to that in the show notes. Larry: But the main thing I wanted to talk about today is what you were just talking about. This amazing cultural heritage operation that you're running there, especially the knowledge graph component of it and the AI, of course, because we're in the 21st century, and that's all anybody talks about. One of the things we talked about before we went on the air was how AI is accelerating the ability for you to build your knowledge graphs of these cultural heritage artifacts and data. Can you talk a little bit about that, how AI is helping in that? Rob: Yeah. Of course. Absolutely. So just a little bit of a background about the knowledge graph itself first before I get to the AI part. So over the past five years, we've built without AI, a very large scale knowledge graph, well, in cultural heritage terms of very large scale, which has about three billion triples in it. And it follows the principles and the design patterns that you mentioned in those posts on Linked Art. It then aligns the people, places, concepts, events, objects, works, collections that we manage here at Yale across the two art museums, Natural History Museum, the dozen or so libraries. There's also a collection of musical instruments, the Institute for the Preservation of Cultural Heritage, and we even have a little outpost in London, in England for art history research that we include. So that work uses the linked art ontology, which is based on the foundational site CRM ontology and is publicly available both in terms of the data, you can just download it. But also in terms of the graph queries, we don't force you to learn SPARQL. We have a user interface on top of it, which allows you to generate queries and find the objects that you are looking for. Rob: So one of the things that we noticed first about the user interface is that only about 5% of searches are actually using the graph affordances. Mostly, 95% of the time, people just put in keywords because that's what they're used to. You go to Google, you type in your five favorite keywords that you think might match and you scroll through the results. However, now in 2026, people are more used to typing in full sentences and then having AI take t
-
41
Max Gärber: Agentic AI Built on a Knowledge Graph Foundation – Episode 45
Max Gärber The promise of agentic AI is being realized in systems like the Service Copilot that Zeiss microscopes provides for its field service engineers. The system integrates technical documentation, subject matter expertise, and user-generated insights which are orchestrated and shared with a suite of AI agents. While it relies heavily on modern LLM technology, it's the system's solid knowledge graph and metadata foundation that make it a success. We talked about: Max's work "turning information into value" at PANTOPIX, a technical documentation and information processes consultancy based in Germany a recent client project working with Zeiss to help their field service engineers operate more efficiently how their prior knowledge management and machine learning work helped them not only cope, but thrive, at the arrival of ChatGPT and LLMs the immediate positive stakeholder feedback they received as they incorporated LLMs into their knowledge architecture how they extended the iiRDS standard with a custom ontology and taxonomies and integrated topic mappings into their system and workflows an overview of the system architecture and tooling, which includes both a graph database and a vector store, an ontology and taxonomy management tool, and documentation of best practices their evolution from simple prompt engineering and RAG approach to an agentic orchestration architecture a few of the agents in their architecture: a planning agent that organizes and orchestrates a content agent that replaces the original RAG system a troubleshooting agent which surfaces past solutions the good problem they experienced of managing enthusiastic user adoption of the new system the unexpected benefits to the Zeiss sales team of the system how subject matter expertise, user generated content, and other insights are captured and used the crucial role of knowledge management practices, structured content, and semantic technology in building the foundation for an organization's AI capabilities Max's bio Maximilian Gärber is Partner and Principal Technical Consultant at PANTOPIX. Max has been working in the field of technical communication for over 15 years. As a Partner and Technical Consultant at PANTOPIX, he is responsible for the technical consultation and implementation of projects. In addition to project management, Max is responsible for data modelling and process optimization in relation to product information (migration, publication, translation) and product catalogues. He is also responsible for product development and ensures that innovative solutions for our customers are continuously developed and optimized. Connect with Max online LinkedIn PANTOPIX Resources mentioned in this episode Industrial Knowledge Graph meets Agentic AI: Service Copilot at ZEISS RMS slide deck Service Copilot from ZEISS article Video Here’s the video version of our conversation: https://www.youtube.com/embed/ttQOHvvxPyw Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 45. When you're a field service engineer dealing with both the typical challenges of information overload and the need to maintain complex machinery like a high-end Zeiss microscope, you'd really benefit from an intelligent knowledge management system, one that integrates technical documentation, subject matter expertise, and user-generated insights. That's exactly what Max Gärber has built - an agentic AI system grounded in a solid knowledge graph foundation. Interview transcript Larry: Hi everyone. Welcome to episode number 45 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Max Garber. Max did a really interesting presentation at the Semantics conference in Vienna last fall, and I've been trying to get him on the show ever since. So here he is. I'm excited to have him here. Max, he's a partner and a technical consultant at PANTOPIX, a consultancy based here in Germany. Welcome, Max. Tell the folks a little bit more about what you're doing these days. Max: Yeah, thanks Larry. Thanks for having me. Yeah, great show. And yeah, we are mostly concerned with helping mainly our industrial customers structure their content and integrate it from various sources into their systems, delivery systems, wherever it is needed. So yeah, it's mainly consultancy on data modeling, on how to do information processes and how to get the best out of your data, so to say. So our mission here is literally turning information into value. Larry: Oh, I love that. That's a great tagline for a consultancy. Well, you did the use case, the case study you talked about in Vienna was really interesting to me. This issue of Zeiss microscopes, in particular their research microscopy solutions arm, which is these big, expensive, complex machines that require a lot of service. Can you talk a little bit about how you got involved with Zeiss and what you do to help them? In particular, the thing you talked about in Vienna was about the system to help their field service engineers. Can you talk a little bit about that project? Max: Yeah, exactly. The main objective there was helping the field service engineer to get the information in that situation when they need it and in the format they need it. That is essentially the bottom line of it. And it started essentially as a knowledge management project. Zeiss, RMS, they have been really into structuring, getting structured content, adding proper metadata to it so it can be used in various cases. The idea has been to integrate from various sources, spare part system, for example, or the manuals from the technical documentation or ticket information and get them into one system so there's a single point of access for the service technicians. So they don't need to spend a lot of time in all of the different systems that there are to get the information about that case they are currently working on because there's a lot they need to consider when servicing or troubleshooting a microscope. Max: And yeah, that project evolved into what is now the Service Copilot because I think it was in early '22 when we started the project. And one part of it was to not only integrate all of that information in one place, but also recommend content to the service technician. So, if you were working on a specific case, so the ticket was known, the product was known, you should get a recommendation of articles, "Hey, this is how you install this and that component," for example. So we actually worked a lot on labeling tickets. We actually had a custom labeling interface and used, let's say, classical machine learning approaches to get that recommendations done. Max: And it worked not so good, but that was also the same time when GPT, I think it was 3.0 or 3.5 came out. And yeah, we were faced with that situation that there was a new technology available that looked like it could do everything and much more what we were currently doing without much effort. So we really faced the situation there to either stop the project or reinvent ourselves, I would say. Larry: I love that juncture. We were talking a little bit before we went on the air about you were really concerned at that point as this arose, but then it turns out that the prior work you had done, the knowledge management work you had done and the machine learning skills and workflows and things you developed, it turns out you ended up being, to my mind, it looks like from that demo I saw in Vienna, at the leading edge of hybrid AI architectures and agentic AI. Max: Yeah, I mean, totally. It evolved really quickly. At the point where we looked into GPT and what language models could do, we asked for, "Hey, can we do some quick prototyping research on this and see if we can replace, let's say, the machine learning pipeline that we had with language models?" And it worked really well from the start. So in the beginning, we had 15 service technicians as pilot users that were constantly evaluating the system and giving us feedback, "Hey, that's good, that's not good." And they said immediately, "Well, this is working really well." I mean, they tried, of course, at the very beginning to trick the system and ask the hard questions. And if you look at the content that they are provided, a service manual, it has hundreds of pages and the products that they are servicing, they look quite similar, but they are quite different. Max: So there's a lot of variants in what components you can use, how you configure the system, how you buy it. So it's really important that if you have a certain product variant, you don't mix that up. And if you look at how the content is, it is very similar. So of course they have the same structure or a very similar structure and certain, let's say, chapters or topics, they are always very similar. So how you install electron microscope A is very similar to how you install electron microscope B, but it's the little differences that are really important if you are doing that installation procedure. If you forget one of those steps, of course, you will fail or you could even do some harm to the system. So it's really important that you not only have similar content or similarity in, let's say, the retrieval of the content, but you can actually know, "This is content for product A and this is content for product B." Max: So all of the work that went into structuring the content, adding metadata to each of the topics and connecting the metadata based on what entities are linkable, the RAG system that we implemented then, it could actually filter out all of the content that was not relevant to the specific question or use case. So the answers were quite good from the beginning. Larry: Yeah. I want to elaborate a bit on the evolution of your RAG architecture, and for folks who don't......
-
40
Quentin Reul: Solving Business Problems with Neuro-Symbolic AI – Episode 44
Quentin Reul The complementary nature of knowledge graphs and LLMs has become clear, and long-time knowledge engineering professionals like Quentin Reul now routinely combine them in hybrid neuro-symbolic AI systems. While it's tempting to get caught up in the details of rapidly advancing AI technology, Quentin emphasizes the importance of always staying focused on the business problems your systems are solving. We talked about: his extensive background in semantic technologies, dating back to the early 2000s his contribution to the SKOS standard an overview of the strengths and weaknesses of LLMs the importance of entity resolution, especially when working with the general information that LLMs are trained on how LLMs accelerate knowledge graph creation and population his take on the scope of symbolic AI, in which he includes expert systems and rule-based systems his approach to architecting neuro-symbolic systems, which always starts with, and stays focused on, the business problem he's trying to solve his advice to avoid the temptation to start projects with technology, and instead always focus on the problems you're solving the importance of staying abreast of technology developments so that you're always able to craft the most efficient solutions Quentin's bio Dr. Quentin Reul is an AI Strategy & Innovation Executive who bridges the gap between high-level business goals and deep technical implementation. As a Director of AI Strategy & Solutions at expert.ai, he specializes in the convergence of Generative AI, Knowledge Graphs, and Agentic Workflows. His focus is moving companies beyond "PoC Purgatory" into production-grade systems that deliver measurable ROI. Unlike traditional strategists, he remains deeply hands-on, continuously prototyping with emerging AI research to stress-test its real-world impact. He doesn't just advocate for AI; he builds the technical roadmaps that translate the latest lab breakthroughs into safe, scalable, and high-value enterprise solutions. Connect with Quentin online LinkedIn BlueSky YouTube Medium Video Here’s the video version of our conversation: https://youtu.be/J8fgIezoNxE Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 44. We're far enough along now in the development of both generative AI learning models and symbolic AI technology like knowledge graphs to see the strengths and weaknesses of each. Quentin Reul has worked with both technologies, and the technologies that preceded them, for many years. He now builds systems that combine the best of both types of AI to deliver solutions that make it easier for people to discover and explore the knowledge and information that they need. Interview transcript Larry: Hi, everyone. Welcome to episode number 44 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Quentin Reul. Quentin is the director of AI Strategy and solutions at expert.ai in the US in Chicago. So welcome, Quentin. Tell the folks a little bit more about what you're up to these days. Quentin: Hi, thank you, Larry, for accepting me and getting me on your podcast. So my name is Quentin Reul. I actually have been around the RDF and the knowledge graph since before it was cool in the early 2000. And today, what I'm helping people in news, media, and entertainment is to see how they can leverage all of the unstructured data that they have and make it in a way that can be structured and they can make their content more findable and discoverable as part of what they are offering to their customers. Larry: Nice. And I love that you've been doing this forever. And one of the things we talked about before we went on the air was your early involvement in the SKOS standard. Can you talk a little bit about your little contribution to that project? Quentin: Yeah. So for this, we do know what SKOS stands for Simple Knowledge Organization System. It's a standard that has been created by the W3C standard around 2005. And being at the University of Aberdeen in Scotland, we had a lot of involvement with the W3C voicing the web ontology language and SKOS. Quentin: For SKOS, I was actually working on my PhD, and the idea of my PhD was to look at two ontologies and trying to map entities from one ontology to the entities in the other one. And a lot of the approach that were taken at the time were either leveraging philosophical kind of representation. And there was not really a lot of things that were looking at linguistics. So the approach that we were taking was looking at WordNet and using the structure of WordNet and maps that to the linguistic information, so the labels that were associated with nodes in the taxonomy. Quentin: But to do that, we needed to have a structure that was transitive. And at the time, SKOS only had broader and narrower, and they didn't have the transitive property. So my contribution was to push for the W3C standard and SKOS to include the SKOS broaderTransitive and SKOS narrowerTransitive, so that I could now have that if A broader B and B broader C, that A broader C was also correct, and having that description logic structure that would enable that. Larry: Well, that's so cool. I love that you have your ideas are ensconced in this 20-year-old standard now. But hey, what I wanted to talk about today and really focus on, I know I was excited to get you on the show because you're doing a lot of work in the area of neuro-symbolic AI, the idea of integrating LLMs and other machine learning technologies with knowledge graphs and other symbolic AI stuff. Larry: It's one of those things that everybody's talking about, but I haven't had the chance to talk on the podcast with many people who are actually doing it. So I'm hoping that you can help the listeners take the leap from this conceptual understanding of the natural complimentary nature of them to actually putting them together in an enterprise architecture. I guess maybe start with the strengths and weaknesses of each of the kinds of AI that we're talking about here. Quentin: Yeah. So if we look at the history of AI, symbolic AI was a thing that came up in the '70s and led to the first AI winter and the second AI winter for that matter. But where they were very good was in the structure and the explainability. So if you aren't very well set set of rules or predictive kind of aspect, it would do it consistently, repeatably, and all of that type of things. Quentin: Now, when you were trying to adopt a rule-based system for new data, it would die off because you had never seen that or a new set of rules or a new set of business requirements, it would just not handle that. And that's where machine learning really helped in making that transition to where we are today. Quentin: And the LLM, contributing further to that, in as much as the machine learning was pretty good at dealing with new patterns, as long as it was similar to the data that you were training with. I think one thing that the LLMs have really shine is in the way that it's able to surface things that you were not predicting from the data. Quentin: One thing that I think that we could have predicted or seen from the data if we had LLMs back in 2020 is we could probably have seen the topic of COVID emerging a bit earlier than what it did. And the reason is, it's because it's very good at surfacing things that it's never seen before. It's able at interpreting the language and analyzing the language in its structure. And by the sentence structure, understanding that things are very similar, and you may use different words for them, but now you're able to interpret them. Quentin: So if we think about information retrieval in the '90s, 2000s, and even in the 2010s, the way that we were doing a lot of these things was using control vocabulary, CISORI, or other dictionaries, and they were used to do query expansion. So you add a keyword, you were looking in the dictionaries, the dictionary were doing an expansion, and then you add something else. Quentin: Well, now with the LLM, that kind of expansion is intuitive to the actual LLM because you had seen so many different aspect and so many occurrence of text that it can actually predict and see what these different terms are associated with a holistic concept. Quentin: Now, that's a good thing. On the bad thing, the LLMs don't have ... Well, they have a cutoff point or knowledge cutoff point, which means that when they are trained, they are trained of information that is in the past. So they're not always that great at predicting, especially current event or information about things that are happening today, they're not very good at that. Quentin: I think if I look at the data, generally between the release of a new model and the nature of the data or the cutoff point, it's about six months to a year. This is like going a bit slower now or shorter in terms, but you have to remember that the time that it takes to train these models, we're speaking about days, weeks, and sometime months as opposed to hours with machine learning models. So they're expensive as well from that perspective. Quentin: Another aspect that they don't have, it's a knowledge base to just take a higher level from a knowledge graph, like the knowledge base. So it's not able to disambiguate information in a large corpus. It's very good to do entity linking within the context of one document. Quentin: So if you pass it one document, let's say a financial document, and it refers to Acme as an enterprise, if Acme is mentioned several times during the document, it will infer that there is only one entity and that entity is Acme. Quentin: But now, imagine that you have a group of financial reports, and these financial reports refer to Acme, a bakery in Illinois, and Acme, a construction company in Maryland....
-
39
Jim Hendler: Scaling AI and Knowledge with the Semantic Web – Episode 43
Jim Hendler As the World Wide Web emerged in the late 1990s, AI experts like Jim Hendler spotted an opportunity to imbue in the new medium, in a scale-able way, knowledge about the information on the web along with its simple representation as content. With his colleagues Tim Berners-Lee, the inventor of the web, and Ora Lasilla, an early expert on AI agents, Jim set out their vision in the famous "Semantic Web" article for the May 2001 issue of Scientific American magazine. Since then, semantic web implementations have blossomed, deployed in virtually every large enterprise on the planet and adding meaning to the web by appearing in the majority of pages on the internet. We talked about: his academic and administrative history at the University of Maryland, Rensselaer Polytechnic Institute, and DARPA the origins of his assertion that "a little semantics goes a long way" his early thinking on the role of memory in AI and its connections to knowledge representation and to SHOE, the first semantic web language his goal to scale up knowledge representation in his work as a grant administrator at DARPA how different departments in the US Air Force used different language to describe airplanes the origins and development of his relationship with Tim Berners-Lee and how his use of URLs in SHOE caused it to click how he and Berners-Lee brought Ora Lassila into the semantic web article how his and Berners-Lee's shared interest in scale contributed to the "a little semantics goes a long way" idea why he lives in awe of Tim Berners-Lee Berners-Lee's insight that a scaleable web needed the 404 error code how including an inverse functionality property like in a relational database would have ruined the semantic web how they came to open the Scientific American paper with an anecdote about agents his early involvement in the AI agent community along with Ora Lassila their shared conviction of the foundational importance of interoperability in their conception of the semantic web how the lack of interoperability between big internet players now is part of the reason for the inability to fully execute on the agent version they set out in the SciAm article the impact of LLMs on the semantic web early examples of semantic web linked data interoperability Google's reclamation of the term "knowledge graph" the reason that the shape of the semantic web was always in their mind a graph how the growth of enterprise data led to their adoption of semantic web technology how the answer to so many modern AI questions is, "knowledge" Jim's bio James Hendler is the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI where he also serves as a special academic advisor to the Provost and the Head of the Cognitive Science Department. He also serves as a member of the Board, and former chair of the UK’s charitable Web Science Trust. Hendler is a long-time researcher in the widespread use of experimental AI techniques including semantics on the Web, scientific data integration, and data policy in government. One of the originators of the Semantic Web, he has authored over 500 books, technical papers, and articles in the areas of Open Data, the Semantic Web, AI, and data policy and governance. He is the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In 2010, Hendler was selected as an “Internet Web Expert” by the US government, helping in the development and launch of the US data.gov open data website and from 2015 to 2024 served as an advisor to DHS and DoE board. From 2021-2024 he served as chair of the ACM’s global Technology Policy Council. Hendler is a Fellow of the AAAI, AAIA, AAAS, ACM, BCS, IEEE and the US National Academy of Public Administration. In 2025, Hendler was awarded the Feigenbaum Prize by the Association for the Advancement of Artificial Intelligence, recognizing a “sustained record of high-impact seminal contributions to experimental AI research.” Connect with Jim online RPI faculty page People and resources mentioned in this interview Tim Berners-Lee Ora Lassila Deb McGuinness The Semantic Web, Scientific American, May 2001 Introducing the Knowledge Graph: things, not strings Massively Parallel Artificial Intelligence paper Attention Is all You Need paper Vision conference Is There An Agent in Your Future? article "And then a miracle occurs" cartoon Jim's SHOE (simple HTML ontology extensions) t-shirt Video Here’s the video version of our conversation: https://youtu.be/DpQki6Y0zx0 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 43. Twenty-five years ago, as AI experts like Jim Hendler navigated the new World Wide Web, they saw an opportunity to imbue in the medium, in a scale-able way, more knowledge than was included in the text on web pages. Jim combined forces with the web's inventor, Tim Berners-Lee, and their mutual friend Ora Lasilla, an expert on AI agents, to set out their vision in the now-famous "Semantic Web" article for Scientific American magazine. The rest, as they say, is history. Interview transcript Larry: Hi everyone. Welcome to episode number 43 of the Knowledge Graph Insights Podcast. I am super extra delighted today to welcome to the show, Jim Hendler. Jim, I think it's fair to say he literally needs no introduction. He was one of the co-authors of the original Semantic Web article in Scientific American. He's been a longtime well-known professor at Rensselaer Polytechnic Institute. So welcome, Jim. Tell the folks a little bit more about what you're up to these days. Jim: Sure. Just to go back a little further in history, I've been doing AI a long time and my first paper was about '77, but a lot of the work we're going to be talking today happened when I was a professor at the University of Maryland, which was from '86 to 2007. And then from 2007 on, I've been at RPI where I was really hired to create a lab that really would be a visionary lab on semantic web and related technologies. I think the president of the university saw the data science revolution coming and saw that that was a key part of it. Jim: So who am I? What am I? Really, what happened was very early in the days of AI, I was working in a lot of different things. I started under Roger Schank at Yale, took a few years off to work professionally at Texas Instruments, which had the first industrial AI lab outside of the well-known ones at Xerox Park and stuff. Then decided no, I really was an academic at heart. So I came back, went to grad school with Gene Charniak at Brown and went from there to the University of Maryland. So you know my job life history. I've bumped around during that time. Living in Maryland, you tend to bump into the Defense Department and things like that and funding and things like that. I was on a few committees and things like that. Eventually asked to come to DARPA for a few years, which is really where a lot of our conversation today probably starts. Jim: And then again, just because it was successful and we had a visionary president here at RPI, she asked me to come and said, "Not only do I want to hire you, but I want you to hire a couple other people you'll work with who'll help put us on the map and this stuff." And I hired Deb McGuinness and I'm sure that'll come up later. And then past 15 years have been a combination of research and administration. So I've done both, doing my own work, working with my students, and also trying to really set up some significant presence of AI on our campus, AI and beyond. Larry: Nice. Yeah, and we'll talk definitely more about your research work and everything. But hey, I want to set a little bit of context about how we met, because I know Dean Allemang from the Knowledge Graph Conference community, and we'll talk a little bit more about the book that you wrote with him later on. But one of the things that he famously says, and always attributes it to you, is that phrase "A little semantics goes a long way." I'd love to open up by talking a little bit about that. Jim: So early on in AI, it was becoming very, very clear to me, and now I'm talking 70s, early 80s, so a long time before we were where scaling means what it does today. But it's very clear to me that a lot of the problem with AI is it didn't scale. And meanwhile, I was seeing these other technologies coming along, the ones that really led to the web, that were looking at a much, much broader thing than the typical AI system. So one of the things I started asking is, how do we scale up AI? And we were looking at traditional knowledge representation languages. I actually have a paper from the 80s. I actually did a book with Hiroki Katano, who's now the... I believe he's still the vice president for research at Sony, if not something higher. And Katanosan and I actually had a book called Massively Parallel Artificial Intelligence in the 80s, but it became clear to me that the machines were part of the story, but the lots and lots of people doing lots and lots of different things was the much more interesting part of the story. Jim: And then also, I've always been intrigued by human memory. You asked me a question and I not only answered that question, but I'm doing right now. It's associating a million things in my mind. And what I'm really doing is winnowing rather than trying to come up with the precise answer. And so I started thinking about how does AI memory start to look like human memory more? In those days, a thousand and then 10,000 and then a million "axioms" were very, very large things, and that's what I wanted to do. And then the web was coming along and I saw that, well, if I'm going to get a million facts about something,...
-
38
Brad Bolliger: Pragmatic Semantic Modeling for Government Data – Episode 42
Brad Bolliger Brad Bolliger entered the knowledge graph space via enterprise software system design and data analytics. That background informs their pragmatic and strategic approach to the use of semantic technology in systems that facilitate information exchange across government agencies. We talked about: their work at EY (Ernst & Young) on data and analytics strategy assessments and enterprise software design and as a co-chair of the NIEMOpen Technical Architecture Committee how their work on EY's Unified Justice Platform introduced them to the knowledge graph world a quick overview of entity resolution the NIEM standard, its origin in the wake of 9/11, its scope, how it's built and managed, and how governments use it their pragmatic approach to ontology and vocabulary management the benefits of the extensibility of the RDF format and knowledge graph technology how entity-centric data modeling accelerates and facilitates systems evolution their take on "analytics enablement engineering" their approach to crafting AI-ready data and building AI-aware enterprise solutions some of the neuro-symbolic AI architecture's they have seen and implemented their call for more systems thinking and systems analysis to create more effective services that work together in a more ethical and effective way Brad's bio Bradley Bolliger (they/them) works in the AI & Data practice of Ernst & Young and serves as co-chair of the NIEMOpen Technical Architecture Committee, an OASIS open standards project for data interoperability. Brad assists clients across various industries with optimizing data platform ecosystems, enhancing customer relationships, and leveraging advanced analytics tools and techniques in their digital transformation efforts. In addition to designing data platforms and AI/NLP systems, Brad has served in lead analyst roles for public sector information system modernization efforts, including major contact center data ecosystems and integrated criminal justice system environments, the latter of which would lead to the development of the UnifiedJusticePlatform. Connect with Brad online LinkedIn Unified Justice Platform Video Here’s the video version of our conversation: https://youtu.be/8XCmF3qXv1E Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 42. When you have to account for the people and other entities involved in high-stakes situations, you need a system that delivers accurate, unambiguous information. Brad Bolliger does this in their work on EY's Unified Justice Platform. Brad is relatively new to the graph world and has adopted a pragmatic approach to semantic modeling and knowledge graphs, focusing on applying lessons learned in their extensive experience in enterprise systems design and data analytics. Interview transcript Larry: Hi, everyone. Welcome to episode number 42 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Brad Bolliger. Brad works in the AI and data practice at EY, the big consultancy in Chicago, and also helps co-chair the NIEM Information Exchange, the Info Exchange Network and standard. Welcome, Brad. Tell the folks a little bit more about what you're up to these days. Brad: Thanks for having me, Larry. I'm thrilled to be talking to you today. Yeah, I'm non-binary. I use they/them pronouns, and I work in the AI and data practice at Ernst & Young, as you said, where I do data and analytics strategy assessments and enterprise software design, things like that. I'm also co-chair of the NIEMOpen Technical Architecture Committee, which is an Oasis Open standard for sharing data in public services primarily, but for specification for developing information exchanges. And I'm working on semantics and software design more generally. Larry: Yeah. And you kind of not stumbled, but you had semantics thrust upon you in this new role, I understand, 'cause one of the projects you work on, I don't know if you're still working on it, was the Unified Justice Platform at EY. Can you talk a little bit about that and how it brought you into the semantics world? Brad: Yeah, that's right. It spun out of an assessment from a county government wanting to overhaul their integrated justice system, which was the collection of actors who collaborate or have this adversarial relationship to administer the process of justice in their jurisdiction. And because very often they're their own elected officials with their own budgets, they have their own software to fulfill their own functions. And that means that they are kind of inherently operating a distributed system, sending messages back and forth to say, "Hey, we booked this person into the jail. Hey, we've got this court date coming up. Hey, we're filing these charges." And they need to orchestrate complex operational processes across multiple software systems and multiple groups of people, again, kind of across jurisdictions or enclaves. And that was, of course, a really interesting systems analysis process that led to the development of a solution to this problem we were trying to assess, which we later called the Unified Justice Platform and is an event-driven architecture for building an entity-resolved knowledge graph as an operational data store programmatically as messages are exchanged between the stakeholders in the Enclave. Larry: Yeah. And you used a couple of words in there. I want to clarify for folks who might be new to them. The notion of entity resolution, the entity-resolved knowledge graph, I'll just point out that we met through our mutual friend, Paco Nathan, who works for Senzing, a company that just does entity resolution. And can you talk a little bit about entity resolution, how that fits into the needs of this distributed system and how you implement it in the platform? Brad: Yeah. Actually, I'll plug almost two years ago, we did a webinar with someone from Senzing and talked about the fundamental utility of entity resolution and relevance, I suppose, as a problem more generally. Entity resolution is essentially about creating, for me, is essentially about creating a high quality master index of whatever kind of data that it is that you're looking at. So in this case, we were talking about a master person index so that you have a more reliable picture of the same natural person, no matter which software system is representing the data that describes the person subject to judicial proceedings in particular. But thinking about entity-centric data modeling more generally, you got a different type of entity, you still need to disambiguate which location you're talking about, which person you're talking about, which entity that really is. And if there are different representations, different records that relate to the same underlying entity, that process of entity resolution therefore has this really broad systemic benefit to data management and data engineering in particular, because ultimately it's about the master index at the end of the day. Larry: Yeah. And as you talked about that, you mentioned that it's like this a canonical record of entities. And how does NIEM fit into that? Because that's a vocabulary as I understand it. Brad: That's right. Larry: Yeah. Can you talk a little bit about NIEM and how that works with entity resolution? Brad: Yeah, very briefly on NIEM, NIEM spun out of the post September 11th realization that public services needed to share data to collaborate more effectively to actually solve emergencies, but just problems in general. And what they realized was that they need to have a common language to collaborate more effectively. Again, because systems, machines, software systems, have this really concrete definition of we use these particular terms and they mean something in our enclave, but you could have a person's full name and a person's first name and a person's last name in two different records, but actually they're the same real person. So NIEM came out of an attempt to at least address some of that disambiguity. And what is most interesting to me about NIEM, honestly, is that it is a collaboratively defined list of vocabulary. So we actually get domain participants involved and they decide we use these terms and they mean these things. Brad: And so it's an attempt to reduce the amount of complexity that you could use to describe a different person, but communicate the same meaning without losing the information that's entailed in some data record. But I'm digressing a little bit probably. What NIEM is a framework for building message specifications, APIs, if you like, or other types of structures, data structures in general that is a community agreed-upon set of terms that have some kind of core relevance, person, entity, organization, or have some domain specific function, like, subject or something in human services and so on. Larry: Interesting. Yeah. And as you talk about that, that attempt to align people on vocabulary is such a notoriously difficult problem. And I don't know how many jurisdictions we're talking about here, but every little town in America has a police department and other social services that they do. What is the scope or the scale of that? And is it facilitated in any way by existing standards or vocabularies? Brad: Oh, very much so. In fact, the problem is even worse than you've described it very charitably, I think. Just in the United States alone, I'm told that there are over 18,000 law enforcement agencies, just law enforcement agencies. Nevermind how ... Anyway, so NIEM is a voluntary open standard. So it is something that is available, but is usually not mandated. There are some places where it is mandated for specific types of services. So the scale of the problem that we're talking about really depends on who's included in the conversation....
-
37
Tara Raafat: Human-Centered Knowledge Graph and Metadata Leadership – Episode 41
Tara Raafat At Bloomberg, Tara Raafat applies her extensive ontology, knowledge graph, and management expertise to create a solid semantic and technical foundation for the enterprise's mission-critical data, information, and knowledge. One of the keys to the success of her knowledge graph projects is her focus on people. She of course employs the best semantic practices and embraces the latest technology, but her knack for engaging the right stakeholders and building the right kinds of teams is arguably what distinguishes her work. We talked about: her history as a knowledge practitioner and metadata strategist the serendipitous intersection of her knowledge work with the needs of new AI systems her view of a knowledge graph as the DNA of enterprise information, a blueprint for systems that manage the growth and evolution of your enterprise's knowledge the importance of human contributions to LLM-augmented ontology and knowledge graph building the people you need to engage to get a knowledge graph project off the ground: executive sponsors, skeptics, enthusiasts, and change-tolerant pioneers the five stars you need on your team to build a successful knowledge graph: ontologists, business people, subject matter experts, engineers, and a KG product owner the importance of balancing the desire for perfect solutions with the pragmatic and practical concerns that ensure business success a productive approach to integrating AI and other tech into your professional work the importance of viewing your knowledge graph as not just another database, but as the very foundation of your enterprise knowledge Tara's bio Dr. Tara Raafat is Head of Metadata and Knowledge Graph Strategy in Bloomberg’s CTO Office, where she leads the development of Bloomberg’s enterprise Knowledge Graph and semantic metadata strategy, aligning it with AI and data integration initiatives to advance next-generation financial intelligence. With over 15 years of expertise in semantic technologies, she has designed knowledge-driven solutions across multiples domains including but not limited to finance, healthcare, industrial symbiosis, and insurance. Before Bloomberg, Tara was Chief Ontologist at Mphasis and co-founded NextAngles™, an AI/semantic platform for regulatory compliance. Tara holds a PhD in Information System Engineering from the UK. She is a strong advocate for humanitarian tech and women in STEM and a frequent speaker at international conferences, where she delivers keynotes, workshops, and tutorials. Connect with Tara online LinkedIn email: traafat at bloomberg dot net Video Here’s the video version of our conversation: https://youtu.be/yw4yWjeixZw Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 41. As groundbreaking new AI capabilities appear on an almost daily basis, it's tempting to focus on the technology. But advanced AI leaders like Tara Raafat focus as much, if not more, on the human side of the knowledge graph equation. As she guides metadata and knowledge graph strategy at Bloomberg, Tara continues her career-long focus on building the star-shaped teams of humans who design and construct a solid foundation for your enterprise knowledge. Interview transcript Larry: Hi everyone. Welcome to episode number 41 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Tara Raafat. She's the head of metadata and knowledge graph strategy at Bloomberg, and a very accomplished ontologist, knowledge graph practitioner. And welcome to the show, Tara. Tell the folks a little bit more about what you're doing these days. Tara: Hi, thank you so much, Larry. I'm super-excited to be here and chatting with you. We always have amazing chats, so I'm looking forward to this one as well. Well, as Larry mentioned, I'm currently working for Bloomberg and I've been in the space of knowledge graphs and ontology and creation for a pretty long time. So I've been in this community, I've seen a lot. And my interest has always been in the application of ontologies and knowledge graphs in industries, and have worked in so many different industries from banking and financial to insurance to medical. So I touched upon a lot of different domains with the application of knowledge graphs. And currently at Bloomberg, I am also leading their metadata strategy and the knowledge graph strategy, so basically semantic metadata. And we're looking over how we are basically connecting all the different data sources and data silos that we have within Bloomberg to make our data ready for all the AI interesting, exciting AI stuff that we're doing. And making sure that we have a great representation of our data. Larry: That's something that comes up all the time in my conversations lately is that people have done this work for years for very good reasons, all those things you just talked about, the importance of this kind of work in finance and insurance and medical fields and things like that. But it turns out that it makes you AI-ready as well. So is that just a happy coincidence or are you doing even more to make your metadata more AI-ready these days? Tara: Yeah. In a sense, you could say happy coincidence, but I think from the very beginning of when you think about ontologies and knowledge graphs, the goal was always to make your data machine-understandable. So whenever people ask me, "You're an ontologist, what does that even mean?" My explanation was always, I take all the information in your head and put it in a way that is machine understandable. So now encoded in that way. So now when we're thinking about the AI era, it's basically we're thinking if AI is operating on our information, on our data, it needs to have the right context and the right knowledge. So it becomes a perfect fit here. So if data is available and ready in your knowledge graph format, it means that it's machine understandable. It has the right context. It has the extra information that an AI system, specifically in the LLM era and generative AI needs in order to make sure that the answering that it's done is more grounded and based in facts, or have a better provenance. And it's more accurate in quality. Larry: Yeah, that's right. You just reminded me, it's not so much serendipity or a happy coincidence. It's like, no, it's just what we do. Because we make things accessible. The whole beauty of this is the- Tara: We knew what's coming, right? The word AI has changed so much. It's the same thing. It just keeps popping up in different contexts, but yeah. Larry: So you're actually a visionary futurist as all of us are in the product. Yeah. In your long experience, one of the things I love most, there's a lot of things I love about your work. I even wrote about it after KGC. I summarized one of your talks, and I think it's on your LinkedIn profile now, you have this great definition of a knowledge graph. And you liken it to a biological concept that I like. So can you talk a little bit about that? Tara: Sure. I see knowledge graph as the DNA of data or DNA of our information. And the reason I started thinking about it that way is when you think about the human DNA, you're literally thinking of the structure and relationship of the organisms and how they operate and how they evolve. So there's a blueprint of their operation and how they would grow and evolve. And for me, that's very similar to when we start creating a knowledge graph representation of our data, because we're again, capturing the structure and relationships between our data. And we're actually encoding the context and the rules that are needed to allow our data to grow and evolve as our business grows and evolves. So there's a very similarity for me there. And it also brings that human touch to this whole concept of knowledge graphs because when I think about knowledge graphs and talking about ontologies, it comes from a philosophical background. And it's a lot more social and human. Tara: And at the end of the day, the foundation of it is how we as humans interpret the world and interpret information. And how then by the use of technology, we encode it, but the interpretation is still very human. So that's why this link for me is actually very interesting. And I think one more thing I would add, which is I do this comparison to also emphasize on the fact that knowledge graphs are not just another database or another data store. So I don't like companies to look at it from that perspective. They really should look at it as the foundation on which their data grows and evolves as their business grows. Larry: Yeah. And that foundational role, it just keeps coming up, again, related to AI a lot, the LLM stuff that I've heard a lot of people talk about the factual foundation for your AI infrastructure and that kind of thing. And again, another one of those things like, yeah, it just happens to be really good at that. And it was purpose built for that from the start. Larry: You mentioned a lot in there, the human element. And that's what I was so enamored of with your talk at KGC and other talks you've done and we've talked about this. And one of the things that, just a quick personal aside, one of the things that drives me nuts about the current AI hype cycle is this idea like, "Oh, we can just get rid of humans. It's great. We'll just have machines instead." I'm like, "Have you not heard..." Every conversation, I've done about 300 different interviews over the years. Every single one of them talks about how it's not technical, it's not procedural or management wisdom. It's always people stuff. It's like change management and working with people. Can you talk about how the people stuff manifests in your work in metadata strategy and knowledge graph construction? I know that's a lot. Tara: Sure. I think there are different aspects to it and we can choose to talk abo
-
36
Alexandre Bertails: The Netflix Unified Data Architecture – Episode 40
Alexandre Bertails At Netflix, Alexandre Bertails and his team have adopted the RDF standard to capture the meaning in their content in a consistent way and generate consistent representations of it for a variety of internal customers. The keys to their system are a Unified Data Architecture (UDA) and a domain modeling language, Upper, that let them quickly and efficiently share complex data projections in the formats that their internal engineering customers need. We talked about: his work at Netflix on the content engineering team, the internal operation that keeps the rest of the business running how their search for "one schema to rule them all" and the need for semantic interoperability led to the creation of the Unified Data Architecture (UDA) the components of Netflix's knowledge graph Upper, their domain modeling language their focus on conceptual RDF, resulting in a system that works more like a virtual knowledge graph his team's decision to "buy RDF" and its standards the challenges of aligning multiple internal teams on ontology-writing standards and how they led to the creation of UDA their two main goals in creating their Upper domain modeling language - to keep it as compact as possible and to support federation the unique nature of Upper and its three essential characteristics - it has to be self-describing, self-referencing, and self-governing their use of SHACL and its role in Upper how his background in computer science and formal logic and his discovery of information science brought him to the RDF world and ultimately to his current role the importance of marketing your work internally and using accessible language to describe it to your stakeholders - for example describing your work as a "domain model" rather than an ontology UDA's ability to permit the automatic distribution of semantically precise data across their business with one click how reading the introduction to the original 1999 RDF specification can help prepare you for the LLM/gen AI era Alexandre's bio Alexandre Bertails is an engineer in Content Engineering at Netflix, where he leads the design of the Upper metamodel and the semantic foundations for UDA (Unified Data Architecture). Connect with Alex online LinkedIn bertails.org Resources mentioned in this interview Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix Resource Description Framework (RDF) Schema Specification (1999) Video Here’s the video version of our conversation: https://youtu.be/DCoEo3rt91M Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 40. When you're orchestrating data operations for an enormous enterprise like Netflix, you need all of the automation help you can get. Alex Bertails and his content engineering team have adopted the RDF standard to build a domain modeling and data distribution platform that lets them automatically share semantically precise data across their business, in the variety of formats that their internal engineering customers need, often with just one click. Interview transcript Larry: Hi, everyone. Welcome to episode number 40 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show, Alex Bertails. Alex is a software engineer at Netflix, where he's done some really interesting work. We'll talk more about that later today. But welcome, Alex, tell the folks a little bit more about what you're up to these days. Alex: Hi, everyone. I'm Alex. I'm part of the content engineering side of Netflix. Just to make it more concrete, most people will think about the streaming products, that's not us. We are more on the enterprise side, so essentially the people helping the business being run, so more internal operations. I'm a software engineer. I've been part of the initiative called UDA for a few years now, and we published that blog post a few months ago, and that's what most people want to talk about. Larry: Yeah, it's amazing that the excitement about that post and so many people talking about it. But one thing, I think I inferred it from the article, but I don't recall a real explicit statement of the problem you were trying to solve in that. Can you talk a little bit about the business prerogatives that drove you to create UDA? Alex: Yeah, totally. There was no UDA, there's no clear problem that we had to solve and really people, won't realize that, but we've been thinking about that point for a very long time. Essentially, on the enterprise side, you have to think about lots of teams having to represent the same business concepts, think about movie actor region, but really hundreds of them really, across different systems. It's not necessarily people not agreeing on what a movie is, although it happens, but it's really what is the movie across a GraphQL service, a data mesh source, an Iceberg table, resulting in duplicating efforts and definitions at the end not aligning. A few years ago, we were in search for this one schema kind of concept that would actually rule them all, and that's how we got into domain modeling, and how can we do that kind of domain modeling across all representations? Alex: So there was one part of it. The other part is we needed to enable what's called semantic interoperability. Once we have the ability to talk about concepts and domain models across all of the representations, then the next question is how can we actually move and help our users move in between all of those data representations? There is one thing to remember from the article that's actually in the title, that's that concept of model once, represent everywhere. The core idea with all of that is to say once we've been able to capture a domain model in one place, then we have the ability to project and generate consistent representations. In our case, we are focused on GraphQL, Avro, Java, and SQL. That's what we have today, but we are looking into adding more support for other representations. Larry: Interesting. And I think every enterprise will have its own mix of data structures like that that they're mapping things to. I love the way you use the word, project. I think different people talk about what they do with the end results of such systems. You have two concepts you talk about as you talk about this, the notion of mappings, which we're just talking about with the data stuff, but also that notion of projection. That's sort of like once you've instantiated something out this system, you project it out to the end user. Is that kind of how it works? Alex: Yes, so we do use the term, projection, in the more mathematical sense, and more people would call that denotations. So essentially, once you have a domain model, and you can reason about it, and we have actually, a formal representation of the domain models, maybe we'll talk about that a little bit later. But then you can actually define how it's supposed to look like, the exact same thing with the same data semantics, but as an API, for example, in GraphQL, or as a data product in Iceberg, in the data warehouse, or as a low-compacted Kafka topic in our data mesh infrastructure as Avro. So for us, we have to make sure that it's quote, unquote, "the same thing," regardless of the data representation that the user is actually interested in. Alex: To put everything together, you talked about the mappings, what's really interesting for us is that the mappings are just one of the three main components that we have in our knowledge graph, because at the end of the day, UDA at its core is really a knowledge graph which is made out of the domain models. We've talked about that. Then the mappings, the mappings are themselves objects in that knowledge graph, and they are here actually to connect the world of concepts from the domain models through the worlds of data containers, which in our case could represent things like an Iceberg table, so we would want to know the coordinates on the Iceberg table and we would want to know the schema. But that applies as well to the data mesh source abstraction and the Avro schema that goes with it. Alex: That would apply as well, and that's a tricky part that very few people actually try to solve, but that would apply to the GraphQL APIs. We want to be able to say and know, oh, there is a type resolver for that GraphQL type that exists in that domain graph service and it's located exactly over there. So that's the kind of granularity that we actually capture in the knowledge graph. Larry: Very cool. And this is the Knowledge Graph Insights podcast, which is how we ended up talking about this. But that notion of the models, and then the mappings, and then the data containers that actually have everything, I'm just trying to get my head around the scale of this knowledge graph. You said this is not just, but you tease it out, it doesn't have to do with the streaming services or the customer facing part of the business, it's just about your kind of content and data media assets that you need to manage on the back end. Are you sort of an internal service? Is that how it's conceived or? Alex: That's a good question. So we are not so much into the binary data. That's not at all what UDA is about. Again, it's knowledge graph podcast, for sure, but even more precisely, when we say knowledge graph, we really mean conceptual RDF and we are very, very clear about that. That means for us, quite a few things. The knowledge graph, in our case, needs to be able to capture the data wherever it lives. We do not want necessarily to be RDF all the way through, but at the very core of it, there is a lot of RDF. I'm trying to remember how we talk about it. But yeah, so think about a graph representation of connected data. And again, it has to work across all of the data representations, but we want to make sure that we have enough information about t
-
35
Torrey Podmajersky: Aligning Language and Meaning in Complex Systems – Episode 39
Torrey Podmajersky Torrey Podmajersky is uniquely well-prepared to help digital teams align on language and meaning. Her father's interest in philosophy led her to an early intellectual journey into semantics, and her work as a UX writer at companies like Google and Microsoft has attuned her to the need to discover and convey precise meaning in complex digital experiences. This helps her span the "semantic gaps" that emerge when diverse groups of stakeholders use different language to describe similar things. We talked about: her work as president at her consultancy, Catbird Content, and as the author of two UX books how her father's interest in philosophy and semantics led her to believe that everyone routinely thinks about what things mean and how to represent meaning the role of community and collaboration in crafting the language that conveys meaning how the educational concept of "prelecting" facilitates crafting shared-meaning experiences the importance of understanding how to discern and account for implicit knowledge in experience design how she identifies "semantic gaps" in the language that various stakeholders use her discovery, and immediate fascination with, the Cyc project and its impact on her semantic design work her take on the fundamental differences between how humans and LLMs create content Torrey's bio Torrey Podmajersky helps teams solve business and customer problems using UX and content at Google, OfferUp, Microsoft, and clients of Catbird Content. She wrote Strategic Writing for UX, is co-authoring UX Skills for Business Strategy, hosts the Button Conference, and teaches content, UX, and other topics at schools and conferences in North America and Europe. Connect with Torrey online LinkedIn Catbird Content (newsletter sign-up) Torrey's Books Strategic Writing for UX UX Skills for Business Strategy Resources mentioned in this interview Cyc project Button Conference UX Methods.org Video Here’s the video version of our conversation: https://youtu.be/0GLpW9gAsG0 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 39. Finding the right language to describe how groups of people agree on the meaning of the things they're working with is hard. Torrey Podmajersky is uniquely well-prepared to meet this challenge. She was raised in a home where where it was common to have philosophical discussions about semantics over dinner. More recently, she's worked as a designer at tech companies like Google, collaborating with diverse teams to find and share the meaning in complex systems. Interview transcript Larry: Hi everyone. Welcome to episode number 39 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Torrey Podmajersky. I've known Torrey for years from the content world, the UX design and content design and UX writing and all those worlds. I used to live very closer to her office in Seattle, but Torrey's currently the president at Catbird Content, her consultancy, and she's guest faculty at the University of Washington iSchool. She does all kinds of interesting stuff, very accomplished author. So welcome Torrey. Tell the folks a little bit more about what you're up to and where all the books are at these days. Torrey: Thanks so much, Larry. I am up to my neck in finishing the books right now. So one just came out the second edition of Strategic Writing for UX that has a brand new chapter on building LLMs into products and updates throughout, of course since it came out six years ago. But I'm also working on the final manuscript with twoTorrey Podmajersky co-authors for UX Skills for Business Strategy. That'll be a wine pairing guide, a deep reference book that connects the business impact that you might want to make, whether you're a UX pro or a PM or a knowledge graph enthusiast working somewhere in product and connecting it to the UX skills you might want to use to make those impacts. Larry: Excellent. I can't wait to read both of those. I love the first edition of the Strategic Writing for UX book, but... Hey, I want to talk today though about, this is the Knowledge Graph Insights podcast, and you recently did this great post and we'll talk more about it in detail in a bit about how you had discovered the Cyc project, which is a real pioneering project in the semantic technology field and really foundational to a lot of the knowledge graph stuff that's happening today. But I want to start with one of the other things we talked about before we went on the air was your observation of the kind of common philosophical roots that we have in rhetoric, maybe not necessarily rhetoric, but the stuff that we do as word nerds, as meaning nerds, as all these different kinds of technology nerds that we are. Tell me a little bit about what you meant because you just hinted that and I was like, oh, good philosophy. I love philosophy. Torrey: Yeah, I love philosophy too, especially through my dad. My dad was a philosophy major at Haverford College and it has deeply influenced his life and his work in semantic knowledge spaces. And I got to grow up in that context thinking that everybody thought deeply about what things meant and how we represent those meanings. I mean, the Plato's Allegory of the Cave was my bedtime story to the extent that we all knew Plato in the cave, geez, dad, just fine. Plato in the cave. We don't really know anything. All we have is facsimiles and representations of meaning and representations of reality, and through that we construct meaning. And I feel like that's all we're ever doing is using language to construct meaning based on our inability to fully perceive reality. Larry: And just for folks who aren't familiar, I love Plato's Allegory of the Cave. It's these poor people chained to a wall and behind them is a projector projecting stuff on the wall in front of them. So all they see is this projection of an imitation of reality, which is much like what we're doing with either both UX writing and I think ontology design and semantic engineering. So that's the perfect analogy to come into this. But your job for the last, I don't know, because you made the transition from teaching to Xbox, what? 10, 12 years ago or something like that? Torrey: In 2010, I joined Xbox and before that I had a short stint in internal communications in a division at Microsoft working for a VP there. Larry: But you've been in the word biz and the meaning biz for a long time because UX writing is, how did you say it? You have to convey meaning. That's the whole point of UX writing is to just get past random words to actually, what are we talking about here? Torrey: It's to make the words that people understand so quickly while they're in an experience, they're just trying to use it. They're not there to read. So we want the words to disappear into ephemeral meaning in their head that they don't even remember. They just knew what to do and which button to press and where to go next to get done what they wanted to get done. Larry: And one of the things about that is getting to that language to do that in an experience, that's a team sport. One of the other things that really struck me about that post you did was the role of community in language and meaning. Talk a little bit about that. Torrey: Yeah, it is a team sport because in general, even if it's the person doing the UX writing or that content design is also the product designer is also the interaction designer. What they're trying to do is take a wide variety of people who might be using this product that might be an incredibly diverse set of people, or it might be a very narrow set of people, let's say all IT pros. We want to sell this product to big corporations that have IT pros that want to manage their data centers. It's a pretty narrow slice of humans, but it's still hugely diverse in terms of from what language they're speaking and what kind of resources they have inside this company to the kind of background they have, to all of the different reasons they might need to manage their data centers right now. Torrey: From, hey, something new came online or there needs to be a new partition or new admin management of access to it or security patch updates to things like, oh, there was an earthquake at a data center and I need to and secure and audit any damage that might've happened. So there's a huge number of reasons. Let me back up of that deep analogy. There's a huge number of reasons even for a tiny population relative to the scope of humanity, a small population doing a relatively well-defined job still has a huge number of reasons they might need to be in an interface doing a thing. And what we have to do when we are designing the content for that and designing the experience itself is anticipate those and try and make sure that we've indicated that whatever reason they're coming there for, if it's a valid reason to use this piece of software, whatever reason they're coming there for, they see it reflected in the text and they understand what to do. Torrey: That is a team sport because I can't, and no individual person can anticipate all of those things simultaneously. We need to think them through sequentially. We need data to base it on. We need to understand, we need to hear from people who will use it or people who would use it to hear about how they think about it and specifically what language do they use, what's already in their head that we can use to reflect on that screen. So it's about understanding that space well enough, coming to understand that space well enough by communicating with other humans to know what are the right things to represent and in what hierarchy or embeddedness or relationalness, and then use some grammar and punctuation and other tricks up our language sleeves. Larry: Yeah, no....
-
34
Casey Hart: The Philosophical Foundations of Ontology Practice – Episode 38
Casey Hart Ontology engineering has its roots in the idea of ontology as defined by classical philosophers. Casey Hart sees many other connections between professional ontology practice and the academic discipline of philosophy and shows how concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general. We talked about: his work as a lead ontologist at Ford and as an ontology consultant his academic background in philosophy the variety of pathways into ontology practice the philosophical principles like metaphysics, epistemology, and logic that inform the practice of ontology his history with the the Cyc project and employment at Cycorp how he re-uses classes like "category" and similar concepts from upper ontologies like gist his definition of "AI" - including his assertion that we should use term to talk about a practice, not a particular technology his reminder that ontologies are models and like all models can oversimplify reality Casey's bio Casey Hart is the lead ontologist for Ford, runs an ontology consultancy, and pilots a growing YouTube channel. He is enthusiastic about philosophy and ontology evangelism. After earning his PhD in philosophy from the University of Wisconsin-Madison (specializing in epistemology and the philosophy of science), he found himself in the private sector at Cycorp. Along his professional career, he has worked in several domains: healthcare, oil & gas, automotive, climate science, agriculture, and retail, among others. Casey believes strongly that ontology should be fun, accessible, resemble what is being modelled, and just as complex as it needs to be. He lives in the Pacific Northwest with his wife and three daughters and a few farm animals. Connect with Casey online LinkedIn ontologyexplained at gmail dot com Ontology Explained YouTube channel Video Here’s the video version of our conversation: https://youtu.be/siqwNncPPBw Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 38. When the subject of philosophy comes up in relation to ontology practice, it's typically cited as the origin of the term, and then the subject is dropped. Casey Hart sees many other connections between ontology practice and it its philosophical roots. In addition to logic as the foundation of OWL, he shows how philosophy concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general. Interview transcript Larry: Hi, everyone. Welcome to episode number 38 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Casey Hart. Casey has a really cool YouTube channel on the philosophy behind ontology engineering and ontology practice. Casey is currently an ontologist at Ford, the motor car company. So welcome Casey, tell the folks a little bit more about what you're up to these days. Casey: Hi. Thanks, Larry. I'm super excited to be here. I've listened to the podcast, and man, your intro sounds so smooth. I was like, "I wonder how many edits that takes." No, you just fire them off, that's beautiful. Casey: Yeah, so like you said, these days I'm the ontologist at Ford, so building out data models for sensor data and vehicle information, all those sorts of fun things. I am also working as a consultant. I've got a couple of different startup healthcare companies and some cybersecurity stuff, little things around the edge. I love evangelizing ontology, talking about it and thinking about it. And as you mentioned for the YouTube channel, that's been my creative outlet. My background is in philosophy and I was interested in, I got my PhD in philosophy, I was going to teach it. You write lots of papers, those sorts of things, and I miss that to some extent getting out into industry, and that's been my way back in to, all right, come up with an idea, try and distill it, think about objections, put it together, and so I'm really enjoying that lately. Larry: And I'm enjoying the video- Casey: Glad to be on the show. Larry: Yeah, no, I really appreciate what you're doing there. One thing I wanted to, and I love that that's how you're getting back to both your philosophical roots, but also part of it is to evangelize ontology practice, which is that's what this podcast is all about, democratizing and sharing practice. But I think, and I just love that you have this explicit and strong philosophical foundation and bent to how you talk about things. I think a lot of times that conversation is like, "Yeah, ontology comes out of philosophy," and that's the end of the conversation. But you've mentioned the role of metaphysics, epistemology, logic, all of which, can you talk a little bit about how those, beyond just I think a lot of people think about logic and OWL and all that stuff, but can you talk a little bit more about the role of metaphysics and epistemology and these other philosophical ideas? Casey: Yeah, definitely. You mentioned this in the pre-notes, "Here's a topic we'd like to get to," and I got into a lot of imposter syndrome on this, right? I'm trying to talk myself out of this, but I think most ontologists have this feeling there's no solid easy pipeline into becoming an ontologist, right? It's a very eclectic group of us. My background's in philosophy, you run into a bunch of librarians, you've got computer scientists who do DB administration, you've got jazz musicians I've run into, it's a weird group. Casey: I say that just to be, sometimes when I get asked about, "Okay, how does ontological practice work?" I think, well, I didn't actually train to be an ontologist. I fell into it, so I'm ill-equipped to say things about what role ontology or philosophy plays in ontology. Casey: I just know I learned philosophy, and then I'm using some of those tools here, so there's two different answers. One is historically, how does philosophy inform and shape the nature of ontology practice? And the other part is just, okay, if you've got a philosophical toolkit of metaphysics and epistemology and logic, how does that apply and make you a better, I mean, the obvious connection is that ontology is a philosophical term. It comes from metaphysics. We look back to Aristotle, and it's the study of that which exists, so do we want to say there's fundamentally fire, air, earth, water or something like that? Or fundamentally, there are these atoms and those are the sorts of things that are part of the inventory of reality. It's not physics, it's metaphysics. It's the thing that in I think for Aristotle is just, it's the book that sits next to his physics in all of his category, in his library of everything. Casey: But when we move that forward to computer science and data modeling, then we're thinking, okay, maybe not for all of reality, although maybe it depends on how big you want your data model to be. But if I'm a retailer, what are the terms and ontology, what are the terms that I care about, the things that I need to model the constituents of reality that matter to me? That might be types, if you're Amazon, it's okay, medium-sized dry goods versus sporting equipment versus something else. If I'm doing a medical ontology, it's patients and payers and providers, et cetera. In philosophy, in ontology, there's a bunch of different tools and examples, but we think about, okay, what are some fundamental distinctions that we want to make? How can we carve nature at its joints in really sensible ways? That's a phrase that you'll hear a lot. We could say more about it if you want. Casey: But what I found is being a philosopher goes into an ontology space is that I have this inventory of examples from all of my grad seminars and various things that I'm looking through and going through whether I want to talk about gavagai and undetached rabbit parts, if that makes sense to anybody, or whether I want to talk about grue as a color, here are some examples, ways that we can chop up the world in unnatural ways versus chopping it up in natural ways and how do we make those distinctions? That applies straightforwardly when you get into building an ontology model for an oil and gas industry or something like that. There's a bunch of ways that we can divvy up all the things you care about, what's the right and sensible way to do it? Casey: I guess that's the metaphysics, ontology way. Logic you mentioned, right? We need to think about reasoning. I don't just want to assert a bunch of things about my data. A fundamental premise of an ontology is that we want to understand our data, we want to confer meaning on it, and that means that we have to be able to leverage the structure of the ontology to infer things smartly. Simple things like set containment are fine if all persons are animals, and then we say something about animals, they're creatures. Then when I say that persons are a subclass of that, then I get for free that persons are spatio-temporal things as well. But we get a lot more complicated inferences as we go. We have to think about statistical reasoning. Just in general, if logic is the study of what makes for good arguments, what follows from what, that's obviously got a lot of applications in ontology, AI. Casey: And then the third piece that we talked about is epistemology. Epistemology is the study of knowledge and belief, roughly about what it means to be justified. The classic example there is, if I know something, what exactly does that amount to? And then Plato says it's justified true beliefs. And then the history of epistemology is littered with examples of trying to cash out exactly what does it mean to be justified. And if you get new information, how can that undercut your justifications? How do you update your beliefs? Casey: More recent stuff, and this is what I did in my dissertation,...
-
33
Chris Mungall: Collaborative Knowledge Graphs in the Life Sciences – Episode 37
Chris Mungall Capturing knowledge in the life sciences is a huge undertaking. The scope of the field extends from the atomic level up to planetary-scale ecosystems, and a wide variety of disciplines collaborate on the research. Chris Mungall and his colleagues at the Berkeley Lab tackle this knowledge-management challenge with well-honed collaborative methods and AI-augmented computational tooling that streamlines the organization of these precious scientific discoveries. We talked about: his biosciences and genetics work at the Berkeley Lab how the complexity and the volume of biological data he works with led to his use of knowledge graphs his early background in AI his contributions to the gene ontology the unique role of bio-curators, non-semantic-tech biologists, in the biological ontology community the diverse range of collaborators involved in building knowledge graphs in the life sciences the variety of collaborative working styles that groups of bio-creators and ontologists have created some key lessons learned in his long history of working on large-scale, collaborative ontologies, key among them, meeting people where they are some of the facilitation methods used in his work, tools like GitHub, for example his group's decision early on to commit to version tracking, making change-tracking an entity in their technical infrastructure how he surfaces and manages the tacit assumptions that diverse collaborators bring to ontology projects how he's using AI and agentic technology in his ontology practice how their decision to adopt versioning early on has enabled them to more easily develop benchmarks and evaluations some of the successes he's had using AI in his knowledge graph work, for example, code refactoring, provenance tracking, and repairing broken links Chris's bio Chris Mungall is Department Head of Biosystems Data Science at Lawrence Berkeley National Laboratory. His research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He is particularly interested in developing and applying knowledge-based AI methods, particularly Knowledge Graphs (KGs) as an approach for integrating and reasoning over multiple types of data. Dr. Mungall and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment. He is a principal investigator on major projects such as the Gene Ontology (GO) Consortium, the Monarch Initiative, the NCATS Biomedical Data Translator, and the National Microbiome Data Collaborative project. Connect with Chris online LinkedIn Berkeley Lab Video Here’s the video version of our conversation: https://youtu.be/HMXKFQgjo5E Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 37. The span of the life sciences extends from the atomic level up to planetary ecosystems. Combine this scale and complexity with the variety of collaborators who manage information about the field, and you end up with a huge knowledge-management challenge. Chris Mungall and his colleagues have developed collaborative methods and computational tooling that enable the construction of ontologies and knowledge graphs that capture this crucial scientific knowledge. Interview transcript Larry: Hi everyone. Welcome to episode number 37 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Chris Mungall. Chris is a computational scientist working in the biosciences at the Lawrence Berkeley National Laboratory. Many people just call it the Berkeley Lab. He's the principal investigator in a group there, has his own lab working on a bunch of interesting stuff, which we're going to talk about today. So welcome, Chris, tell the folks a little bit more about what you're up to these days. Chris: Hi, Larry. It's great to be here. Yeah, so as you said, I'm here at Berkeley Lab. We're located in the Bay Area. We're just above UC Berkeley campus. We have a nice view of the San Francisco Bay looking into San Francisco, and so we're a national lab, so we're part of the Department of Energy National Lab system, and we have multiple different areas here in the lab looking at different aspects of science from physics, energy technologies, material science. I'm in the biosciences area, so we are really interested in how we can advance biological science in areas relevant to national scale challenges really in different areas like energy, the environment, health and bio-manufacturing. Chris: My own particular research is really focused on the role of genes and in particular the role of genes in complex systems. So this could be the genes that we have in our own cells, the genes in human beings, how they all work together to hopefully create a healthy human being. One part of my research also looks at the role of genes in the environment, and in particular the role of genes inside tiny old microbes that you'll find in the ocean water and in the soil. And how these genes all work together, both to help drive these microbial systems, help them work together and how they all work together really to drive ecosystems and biogeochemical cycles. Chris: So I think the overall aim is really just to get a picture of these genes and how they interact in these kind of complex systems and build up models of complex systems from scales right the way from atoms through the way through to organisms and indeed all the way to earth-scale systems. So my work is all computational. I don't have a wet lab. So one thing that we realized early on is just when you are sequencing these genomes and trying to interpret the genes, you're generating a lot of information and you need to be able to organize that somehow. And so that's how we arrived at working on knowledge graphs, basically to assemble all of this information together and to be able to use it in algorithms to help us interpret biological data and help us figure out the role of genes in these organisms. Larry: Yeah, many of the people I've talked to on this podcast, they come out of the semantic technology world and apply it in some place or another. It sounds like you came to this world because of the need to work with all the data you've got. What was your learning curve? Was it just another thing in your computational toolkit? Chris: Yeah, in some ways. In fact, my background is, if you go back far enough, my original background is more on the computational side and my undergrad was in AI, but this is back when AI meant good old-fashioned AI and symbolic reasoning and developing Prolog rules to reason about the world and so on. And at that time, I wasn't so interested in that side of AI. I really wanted to push forward with some of the more nascent neural network type approaches. But in those days, we didn't really have the computational power and I thought, "Well, maybe I really need to, I actually learned something about biological systems before trying to simulate them." So that's how I got involved in genomics. This was around about the time of just before the sequencing of the human genome, and I just got really interested in this area, a position came up here at Lawrence Berkeley National Laboratory, and I just got really involved in analyzing some of these genomes. Chris: And in doing this, I came across this project called the Gene Ontology that was developed by some of my colleagues originally in Cambridge and at Lawrence Berkeley National Laboratory. And the goal here was really as we were sequencing these genomes and we were figuring out there's 20,000 genes in the human genome, we discovered we had no way to really categorize what the functions of these different genes were. And if you think about it, there's multiple different ways that you can describe the function of any kind of machine, whether it's a molecular machine inside one of your cells or your car or your iPhone or whatever. You can describe it in terms of what the intent of that machine is. You can describe it in terms of where that machine is localized and what it does, and how that machine works as part of a larger ensemble of machines to achieve some larger objective. Chris: So my colleagues came up with this thing called the gene ontology, and I looked at that and I said, "Hey, I've got this background in symbolic reasoning and good old-fashioned AI. Maybe I could play a role in helping organize all of this information and figuring out ways to connect it together as part of a larger graph." We didn't call them knowledge graphs at this time, but we're essentially building knowledge graphs at the time and make use of, in those days quite early semantic web technologies. This is even before the development of all the web ontology language, but there was still this notion that we could use, we could use rules in combination with graphs to make inferences about things. And I thought, "Well, this seems like an ideal opportunity to apply some of this technology." Larry: That's interesting. It's funny we didn't plan this, but the episode right before you in the queue was of my friend Emeka Okoye. He's a guy who was building knowledge graphs in the late '90s, early 2000s, mostly the early 2000s before the term had been coined, and I think maybe even before a lot of the RDF and OWL and all that stuff was there. So you mentioned Prolog earlier, and what was your toolkit then, and how has it evolved up to the present? That's a huge question. Yeah. Chris: I didn't mean to get into my whole early days with Prolog. Yeah, I've definitely had some interest in applying a lot of these logic programming technologies. As you're aware,...
-
32
Emeka Okoye: Exploring the Semantic Web with the Model Context Protocol – Episode 36
Emeka Okoye Semantic technologies permit powerful connections across a variety of linked data resources across the web. Until recently, developers had to learn the RDF language to discover and use these resources. Leveraging the new Model Context Protocol (MCP) and LLM-powered natural-language interfaces, Emeka Okoye has created the RDF Explorer, an MCP service that lets any developer surf the semantic web without having to learn its specialized language. We talked about: his long history in knowledge engineering and AI agents his deep involvement in the business and technology communities in Nigeria, including founding the country's first internet startup how he was building knowledge graphs before Google coined the term an overview of MCP, the Model Context Protocol, and its benefits the RDF Explorer MCP server he has developed how the MCP protocol and helps ease some of the challenges that semantic web developers have traditionally faced the capabilities of his RDF Explorer: facilitating communication between AI applications, language models, and RDF data enabling graph exploration and graph data analysis via SPARQL queries browsing, accessing, and evaluating linked-open-data RDF resources the origins of RDF Explorer in his attempt to improve ontology engineering tooling his objections to "vibe ontology" creation the ability of RDF Explorer to let non-RDF developers users access knowledge graph data how accessing knowledge graph data addresses the problem of the static nature of the data in language models the natural connections he sees between neural network AI and symbolic AI like knowledge graphs, and the tech tribalism he sees in the broader AI world that prevents others from seeing them how the ability of LLMs to predict likely language isn't true intelligence or actual knowledge some of the lessons he learned by building the RDF Explorer, e.g., how the MCP protocol removes a lot of the complexity in building hybrid AI solutions how MCP helps him validate the ontologies he creates Emeka's bio Emeka is a Knowledge Engineer, Semantic Architect, and Generative AI Engineer who leverages his over two decades of expertise in ontology and knowledge engineering and software development to architect, develop, and deploy innovative, data-centric AI products and intelligent cognitive systems to enable organizations in their Digital Transformation journey to enhance their data infrastructure, harness their data assets for high-level cognitive tasks and decision-making processes, and drive innovation and efficiency enroute to achieving their organizational goals. Emeka’s experience has embraced a breadth of technologies his primary focus being solution design, engineering and product development while working with a cross section of professionals across various cultures in Africa and Europe in solving problems at a complex level. Emeka can understand and explain technologies from deep diving under the hood to the value proposition level. Connect with Emeka online LinkedIn Making Knowledge Graphs Accessible: My Journey with MCP and RDF Explorer RDF Explorer (GitHub) Video Here’s the video version of our conversation: https://youtu.be/GK4cqtgYRfA Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 36. The widespread adoption of semantic technologies has created a variety of linked data resources on the web. Until recently, you had to learn semantic tools to access that data. The arrival of LLMs, with their conversational interfaces and ability to translate natural language into knowledge graph queries, combined with the new Model Context Protocol, has empowered semantic web experts like Emeka Okoye to build tools that let any developer surf the semantic web. Interview transcript Larry: Hi, everyone. Welcome to episode number 36 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show my good friend, Emeka Okoye. Emeka is a really interesting ontology practitioner and knowledge engineer, and he's operating now at the intersection of knowledge engineering and generative AI, which I think is a really interesting intersection and that's what we're going to talk about today. So welcome, Emeka. Tell the folks a little bit more about what you're up to these days. Emeka: Oh, well, thank you bringing me to this awesome podcast. I'm proud to be here. I have been involved in knowledge engineering or more like AI. We need to understand that knowledge engineering is important for AI because it creates the knowledge layer. So that's where we have knowledge graphs. There's been a lot of tribalism in AI, the neural nets on one side and the symbolic AI on the other side. So I am in for the convergence. I've always believed in the convergence. Emeka: Funny enough, I've been teaching and mentoring young ones on both sides of the divide since 2016 in the Nigerian data science space. So no surprises that generative AI boomed, and I needed to find reasons to see how we can integrate both sides, because that's what AI is all about, the best of both worlds, best of neural nets, and then best of symbolic AI. That's the future. I mean, there's no doubt about it. So that foundation, I needed to be there and that's why I've been working on both sides. So from knowledge graphs to AI agents. Larry: That's so funny, we didn't talk about this before I hit record, but right before we started this interview, I posted a thing to LinkedIn about exactly that. It was specifically about the need for executive education around hybrid AI architectures 'cause all they have is Silicon Valley hype. That's all the information they have. But more to the point, you're a hybrid practice. Well, first of all, I've known you for years now, and it just occurred to me, I don't really know your academic background, but it sounds like you're equally grounded in machine learning and knowledge representation stuff. Have you always pursued both? Emeka: I'm a geologist. That's the only qualification I do have. Immediately I found love with personal computers. So once the PC era boomed, I just went in programming. Nigeria was once one of the biggest software countries in the world at a point in time. Our software houses were building financial and banking systems the whole of North America were using, and some part of Europe. So we are that big. So when the internet came, we embraced it that early. I was already building internet protocols using Visual Basic, and not long after I co-founded the first startup in Nigeria. And then after that I worked with probably one of the earliest Semantic Web brands in the world, which is OpenLink Software. I became the Chief Technical Officer in the whole of Africa. Emeka: So I was with OpenLink Software when Tim Berners-Lee came up with the Semantic Web thing and Ora and co coming up with agents. So I started early on, thanks to my mentor, my boss then, Kingsley Idehen, who mentored me throughout and made me understand that the future was Semantic Web. So I dove right into it. And can you believe this, we were already creating knowledge graph before Google called it knowledge graph. I had created one for a client, which is Music In Africa by 2011, 2012. Larry: That's right before they introduced the term knowledge graph with their... That's so interesting because... And the RDF and the OWL and all the Semantic Web tech goes back 10 years before that. So that gap between the dawn of the Semantic Web and the coining of the term knowledge graph, you were just in there doing it. Emeka: Yes. Yeah, we were already doing it. And remember I came from a company that is on top of this technology. You who Kingsley Idehen is. He's my former boss, and mentor today, even after. I left OpenLink Software, he was there to guide me in. So most of what I know in semantic technology comes from Kingsley. So we were already doing this. So my understanding of the technology is very sound. Academia-wise, I didn't do anything much in that regards on the technology, but I'm hoping I'll do research in the future, because as I'm trying to come into Europe, I noticed that there are a lot of research-based jobs and AI is something I would love to devote research time. Larry: Yeah, and I know a lot of those people, and there's not a specific track yet around the hybrid AI stuff. I hope you get a chance to do that. But hey, that's what I want to really focus on today. So your background, your RDF Explorer project makes even more sense to me now. I just want to say real quickly about that. Emeka and I meet once or twice a week, and our Dataworthy Collective, which we co-organize with some other folks, and I was just embarrassed that I had totally missed this awesome piece you wrote for LinkedIn about RDF Explorer, and then you just happened to mention it in one of our meetings, and I went and read it and I was like, "Whoa, that's amazing. We got to talk about this." Larry: So here we are. Finally, I get to share the RDF Explorer with folks. So tell me, I think one thing I've been a little bit surprised by is that not everybody in the knowledge graph and semantic tech space is familiar with MCP. They maybe know the acronym and what it stands for, but can you talk just a little bit about the Model Context Protocol? Emeka: All right, so the Model Context Protocol, which was created by Anthropic sometimes in November 2024, is a standardized protocol which allows AI agents to connect and interact with external tools and different data sources in a simplified manner. It's that simplicity that is the attraction. So it removes a lot of stress that comes to connecting different data source to it. Now, just to give you an idea what we are talking about. Before MCP, we had all these agentic RAG solutions. It means that you hand-code every individual sources to pull that
-
31
Tom Plasterer: The Origins of FAIR Data Practices – Episode 35
Tom Plasterer Shortly after the semantic web was introduced, the demand for discoverable and shareable data arose in both research and industry. Tom Plasterer was instrumental in the early conception and creation of the FAIR data principle, the idea that data should be findable, accessible, interoperable, and reusable. From its origins in the semantic web community, scientific research, and the pharmaceutical industry, the FAIR data idea has spread across academia, research, industry, and enterprises of all kinds. We talked about: his recent move from a big pharma company to Exponential Data where he leads the knowledge graph and FAIR data practices the direct line from the original semantic web concept to FAIR data principles the scope of the FAIR acronym, not just four concepts, but actually 15 how the accessibility requirement in FAIR distinguishes the standard from the open data the role of knowledge graphs in the implementation of a FAIR data program the intentional omission of prescribed implementations in the development of FAIR and the ensuing variety of implementation patterns how the desire for consensus in the biology community smoothed the development of the FAIR standard the role of knowledge graphs in providing a structure for sharing terminology and other information in a scientific community how his interest in omics led him to computer science and then to the people skills crucial to knowledge graph work the origins of the impetus for FAIR in European scientific research and the pharmaceutical industry the growing adoption of FAIR as enterprises mature their web thinking and vendors offer products to help with implementations the roles of both open science and the accessibility needs in industry contributed to the development of FAIR the interesting new space at the intersection of generative AI and FAIR and knowledge graph the crucial foundational role of FAIR in AI systems Tom's bio Dr. Tom Plasterer is a leading expert in data strategy and bioinformatics, specializing in the application of knowledge graphs and FAIR data principles within life sciences and healthcare. With over two decades of experience in both industry and academia, he has significantly contributed to bioinformatics, systems biology, biomarker discovery, and data stewardship. His entrepreneurial ventures include co-founding PanGenX, a Personalized Medicine/Pharmacogenetics Knowledge Base start-up, and directing Project Planning and Data Interpretation at BG Medicine. During his extensive tenure at AstraZeneca, he was instrumental in championing Data Centricity, FAIR Data, and Knowledge Graph initiatives across various IT and scientific business units. Currently, Dr. Plasterer serves as the Managing Director of Knowledge Graph and FAIR Data Capability at XponentL Data, where he defines strategy and implements advanced applications of FAIR data, knowledge graphs, and generative AI for the life science and healthcare industries. He is also a prominent figure in the community, having co-founded the Pistoia Alliance FAIR Data Implementation group and serving on its FAIR data advisory board. Additionally, he co-organizes the Health Care and Life Sciences symposium at the Knowledge Graph Conference and is a member of Elsevier’s Corporate Advisory Board. Connect with Tom online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/Lt9Dc0Jvr4c Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 35. With the introduction of semantic web technologies in the early 2000s, the World Wide Web began to look something like a giant database. And with great data, comes great responsibility. In response to the needs of data stewards and consumers across science, industry, and technology, the FAIR data principle - F A I R - was introduced. Tom Plasterer was instrumental in the early efforts to make web data findable, accessible, interoperable, and reusable. Interview transcript Larry: Hi everyone. Welcome to episode number 35 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Tom Plaster. Tom is the managing director who leads the knowledge graph and FAIR practices at Exponential Data, which is a company in the Boston area, or he's in the Boston area. So welcome Tom, tell the folks a little bit more about what you're up to these days. Tom: Thanks, Larry. And great pleasure to be with you and the audience. So I'm now, just last week I hit a year at Exponential Data, after 12 and a half years at big pharma. And so, I came over to Exponential Data to lead the knowledge graph and FAIR data practices, and also to unite with our expertise around artificial intelligence. One of the things that I started to get really excited about with the knowledge graph conference over the last few years was the convergence of these two communities, and really how AI knowledge graphs and especially FAIR data, as a way of having curated trusted data for these applications, could be completely synergistic. And so that was really what brought me there. And when I joined, we were around 40 people. As I was leading this practice, we grew to about 240. And were recently acquired by Genpact. Tom: And so, now we're now part of a much bigger organization bringing our strength of artificial intelligence, generative AI, knowledge graphs and FAIR data to this larger organization. So that's been really my journey over the last year. And really wanted to bring these two technologies together. And one of the things that we've really found is how important FAIR data is to both sides of the equation. And so, this is really where trusted data, clean data, data that follows standards, data that's self-describing, all of the things that you want to do for FAIR data, are really important foundationally for what you want to do with knowledge graphs and for how you want to give this trusted data to large language models, generative AI, to get the most out of those technologies. So in a nutshell, that's been my journey over the last year. Larry: Yeah. And we didn't talk explicitly about it as we were preparing for this, but AI is the logical and obvious place where all this is going now. And I think everybody's concerned about delivering trustworthy, clean, FAIR data wherever you are. But do you feel like have you been uniquely well-prepared for that with both your company but... And I know your background, that's what we want to talk about today, is the origins of the FAIR data standard and you've been around it right from the get-go right? Tom: Right from the beginning. And the community leans a lot on earlier trends around the Semantic Web, Semantic Web technology. I think a lot of the founders are very web centric in their thinking. And there's a direct tie between with Tim Berners-Lee, Ora Lassila, Jim Hendler wanted to accomplish with the Semantic Web, how the standards evolved there and then grew up and became available within graph databases, eventually knowledge graphs, as a vehicle to prove that FAIR data worked. And so, that's a direct thread between that and wanting to have knowledge injection for generative AI and the value there. The whole thing flows really, really well. Larry: Yeah, interesting. And one thing as you said that the direct descendants from Tim Berners-Lee's and Ora and Jim's, I guess the paper in Scientific American, one of the things that arose like, I don't know what, five or 10 years after that was Tim Berners-Lee's notion of five star data, like the kind of 1, 2, 3, 4, 5 star rating. And then only, what, five, not five, seven years later, FAIR came along. Can you talk a little bit about how these perceptions of and the way good data and their practices are codified? Tom: Sure. So if we think about five star linked data and kind of what Tim was trying to accomplish there, get your data on the web, having an accessible format, follow standards, have it linked together, that's really, really close to the FAIR data principles itself. And I think a lot of the things within the FAIR data principles were learned directly from that. And I guess first I should take a step back and explain. People have probably come across the FAIR data principles, and they've heard Findable, Accessible, Interoperable, Reusable, and they think there's four of them. There's 15 of them. So this is where it gets to be a little bit more complicated. So FAIR as an acronym was just a very nice way of marketing and putting these things together, but a lot of the ways that they can become really useful is the cell principle. So I'm just going to talk about them and describe them real briefly without being too technical. People can learn more about it in the 2016 Nature Medicine paper. Tom: So the findable is really about URIs. And so it's really about can I identify both an instance of data or a concept, a class that follows a URI, later an IRI, and sometimes we're calling them persistent identifiers or GUPRIs, so Global, Unique, Persistent Resource Identifiers, all the same thing. So can you use that to identify a piece of data, and if so, when you resolve it, will it provide useful metadata for both humans and machines? That's really the most important piece that you need to do to get started. Let's put an identifier on our data, on our metadata, so that we can resolve it, find it, put it in an index, so that we can get something useful out of it. So that's about four of the F principles there. Tom: Accessible is really about interoperability and it's following common protocols. So HTTP, HTTPS, we're not reinventing protocols, we're following standards. And then authentication on top of that in some sort of a certified manner. Usually it ends up being LDAP with single sign-on or something like that. Some way of authenticating your data....
-
30
Mara Inglezakis Owens: A People-Loving Enterprise Architect – Episode 34
Mara Inglezakis Owens Mara Inglezakis Owens brings a human-centered focus to her work as an enterprise architect at a major US airline. Drawing on her background in the humanities and her pragmatic approach to business, she has developed a practice that embodies both "digital anthropology" and product thinking. The result is a knowledge architecture that works for its users and consistently demonstrates its value to key stakeholders. We talked about: her role as an enterprise architect at a major US airline how her background as a humanities scholar, and especially as a rhetoric teacher, prepared her for her current work as a trusted business advisor some important mentoring she received early in her career how "digital anthropology" and product thinking fit into her enterprise architecture practice how she demonstrates the financial value of her work to executives and other stakeholders her thoughtful approach to the digitalization process and systems design the importance of documentation in knowledge engineering work how to sort out and document stakeholders' self-reports versus their actual behavior the scope of her knowledge modeling work, not just physical objects in the world, but also processes and procedures two important lessons she's learned over her career: don't be afraid to justify financial investment in your work, and "don't be so attached to an ideal outcome that you miss the best possible" Mara's bio Mara Inglezakis Owens is an enterprise architect who specializes in digitalization and knowledge management. She has deep experience in end-to-end supply chain as well as in planning, product, and program management. Mara’s background is in epistemology (history and philosophy of science, information science, and literature), which gives a unique, humanistic flavor to her practice. When she is not working, Mara enjoys aviation, creative writing, gardening, and raising her children. She lives in Minneapolis. Connect with Mara online LinkedIn email: mara dot inglezakis dot owens at gmail dot com Video Here’s the video version of our conversation: https://youtu.be/d8JUkq8bMIc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 34. When think about architecting knowledge systems for a giant business like a global airline, you might picture huge databases and complex spaghetti diagrams of enterprise architectures. These do in fact exist, but the thing that actually makes these systems work is an understanding of the needs of the people who use, manage, and finance them. That's the important, human-focused work that Mara Inglezakis Owens does as an enterprise architect at a major US airline. Interview transcript Larry: Hi, everyone. Welcome to episode 34 of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show, Mara, I'm going to get this right, Inglezakis Owens. She's an enterprise architect at a major US airline. So, welcome, Mara. Tell the folks a little bit more about what you're up to these days. Mara: Hi, everybody. My name's Mara. And these days I am achieving my childhood dream of working in aviation, not as a pilot, but that'll happen, but as an enterprise architect. I've been doing EA, also data and information architecture, across the whole scope of supply chain for about 10 years, everything from commodity sourcing to SaaS, software as a service, to now logistics. And a lot of my days, I spend interviewing subject matter experts, convincing business leaders they should do stuff, and on my best days, I get to crawl around on my hands and knees in an airplane hangar. Larry: Oh, fun. That is ... Yeah. I didn't know ... I knew that there's that great picture of you sitting in the jet engine, but I didn't realize this was the fulfillment of a childhood dream. That's awesome. But everything you've just said ties in so well to the tagline on your LinkedIn profile. You're like, "I'm a people-loving architect, and data leader." And one of the things I love about that, we talked a fair amount at the knowledge graph conference about your background in the humanities- Mara: We did. Larry: ... and your transition into your current role. I would love to hear ... And what you just said, like, the end of what you were just saying about so much of your job is about interacting with people, and convincing business leaders to fund you, and stuff. Can you talk a little bit about that? Like, what drew you into the humanities in the first place, your transition out of it, and here we are today. Mara: 100%. Before I talk about being in the humanities, I love to read, I was an epistemologist, and a 19th Century scholar. But before that, when I was a little girl, I was writing my own websites in HTML, XML, and some of the technologies that eventually got to be used in the semantic web, which is how I entered the knowledge graph space way later as an adult. So, that got put on hold. Mara: I love to read. So, I became a humanities scholar, and I was for about five years, the lowest of the low adjuncts at an R1. My teaching experience, not my scholarship, although, I did a lot of thinking about how people interact with written media, and how they enter internal argumentation with those media, and come to know the world differently. That's what most of my work was about. It was interdisciplinary with literature, history, philosophy of science, which is why I say epistemology. Mara: But my best teacher for coming to where I am today was being a teacher. So, lowest of the low, first year, although, I spent most of the time teaching applied rhetoric, I was teaching freshman comp. So, this is a super diverse group of students who are showing up for a required class. To be successful, I needed to do two things. One, I had to listen carefully to what these students cared about to actually get them to get something out of the between $5000 and $8000 they were paying for this course. And then I had to generalize what I wanted them to learn about enough to make it accessible to them. Okay? So, my goal throughout my teaching career, similar to my goal now, is to inculcate effective communication through fit for purpose argumentation. Mara: So, while a lot of my colleagues were being like, "Here's an essay. Write something about it. Make it sound smart," what I did, because I needed my students to hook in, to be engaged, because the vast ... I maybe, I don't know, taught, like, four English majors over my career ... No HSPS [humanities, social, and political science] people. I told my students, "Okay, guys. Get into groups. So, you're set up to do some argumentation amongst yourselves, pick a little part of this essay," this was the first year, "And something you react strongly to. What about the sentences are doing this? Grammar, syntax, semantics. What's the whole universe of your group reactions? How are they related, or not related?" Mara: This evolved into a directed research curriculum in my applied rhetoric courses. So, I said, "Okay. Okay, guys. Go find something out in the world that needs to change. We need a pedestrian bridge over the street. We need better accessibility for people with disabilities in our gym. We need better gym hours. Figure out how it's working, frame up a case for someone who can make a change, do your argumentation, go present it." Some of my students actually argued well enough that they got a stoplight installed on a really busy street corner. So, it worked. Mara: So, fast-forward, lots of life drama that brought me out of the humanities into what is a much better place for me, in corporate. I'm in a trusted advisor role, not so dissimilar to being a teacher. As a trusted advisor, I have to be attuned to what the business says that they want. So, if they're saying ... And then what they demonstrate that they want through their behaviors, and through their artifacts, often times, their processes and information system. And then I have to think about why and how those things align, or don't align. Mara: Because I'm full-time employed, and this is in this role, and all of my corporate roles, but I'm, effectively, providing a boutique service. It's not enough for me to come up with something that sounds smart, or cool. I have to come up with a solution that accommodates process data, technology, and, most importantly, people, and that actually fulfills a business need. And I used to think about the connection from my academic career to my corporate career as like, "Oh, I became a good EA, because I taught my students to do this," but with about a decade of reflection, I'm realizing that teaching was really mutual. Mara: Like, I asked my students to show me what they were thinking. I evaluated what they were doing. I was very critical, but I was generous. And I was with them as their efforts bore fruit, or didn't. But how I demonstrated, elicited, and critiqued them evolved with constant, and often very, very vulnerable feedback. Like, I do with my clients now, I constantly asked, "How am I doing? Am I giving you what you need? Do you need something else?' Mara: For a student, it's really hard to say that to someone who's got the power of a grade over you. It's not as perhaps scary when we're all adults in corporate, but I still think many adults ... I was always the stupidest person in the room as a scholar. So, I don't have this problem, but a lot of us are worried about appearing, "I'm not smart enough. I am not creative enough." So, I still have to flex that good, compassionate, people-loving, listening muscle all the time in corporate just like my wonderful undergrads at Indiana University taught me how to do. Larry: That's so awesome. I've thought a lot about rhetoric. In fact, I don't know if we talked about this in New York, but my first career was in college textbook publishing,...
-
29
Frank van Harmelen: Hybrid Human-Machine Intelligence for the AI Age – Episode 33
Frank van Harmelen Much of the conversation around AI architectures lately is about neuro-symbolic systems that combine neural-network learning tech like LLMs and symbolic AI like knowledge graphs. Frank van Harmelen's research has followed this path, but he puts all of his AI research in the larger context of how these technical systems can best support people. While some in the AI world seek to replace humans with machines, Frank focuses on AI systems that collaborate effectively with people. We talked about: his role as a professor of AI at the Vrije Universiteit in Amsterdam how rapid change in the AI world has affected the 10-year, €20-million Hybrid Intelligence Centre research he oversees the focus of his research on the hybrid combination of human and machine intelligence how the introduction of conversational interfaces has advance AI-human collaboration a few of the benefits of hybrid human-AI collaboration the importance of a shared worldview in any collaborative effort the role of the psychological concept of "theory of mind" in hybrid human-AI systems the emergence of neuro-symbolic solutions how he helps his students see the differences between systems 1 and 2 thinking and its relevance in AI systems his role in establishing the foundations of the semantic web the challenges of running a program that spans seven universities and employs dozens of faculty and PhD students some examples of use cases for hybrid AI-human systems his take on agentic AI, and the importance of humans in agent systems some classic research on multi-agent computer systems the four research challenges - collaboration, adaptation, responsibility, and explainability - they are tackling in their hybrid intelligence research his take on the different approaches to AI in Europe, the US, and China the matrix structure he uses to allocate people and resources to three key research areas: problems, solutions, and evaluation his belief that "AI is there to collaborate with people and not to replace us" Frank's bio Since 2000 Frank van Harmelen has played a leading role in the development of the Semantic Web. He is a co-designer of the Web Ontology Language OWL, which has become a worldwide standard. He co-authored the first academic textbook of the field, and was one of the architects of Sesame, an RDF storage and retrieval engine, which is in wide academic and industrial use. This work received the 10-year impact award at the International Semantic Web Conference. Linked Open Data and Knowledge Graphs are important spin-offs from this work. Since 2020, Frank is is scientific director of the Hybrid Intelligence Centre, where 50 PhD students and as many faculty members from 7 Dutch universities investigate AI systems that collaborate with people instead of replacing them. The large scale of modern knowledge graphs that contain hundreds of millions of entities and relationships (made possible partly by the work of Van Harmelen and his team) opened the door to combine these symbolic knowledge representations with machine learning. Since 2018, Frank has pivoted his research group from purely symbolic Knowledge Representation to Neuro-Symbolic forms of AI. Connect with Frank online Hybrid Intelligence Centre Video Here’s the video version of our conversation: https://youtu.be/ox20_l67R7I Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 33. As the AI landscape has evolved over the past few years, hybrid architectures that combine LLMs, knowledge graphs, and other AI technology have become the norm. Frank van Harmelen argues that the ultimate hybrid system must also include humans. He's running a 10-year, €20 million research program in the Netherlands to explore exactly this. His Hybrid Intelligence Centre investigates AI systems that collaborate with people instead of replacing them. Interview transcript Larry: Hi, everyone. Welcome to episode number 33 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Frank van Harmelen. Frank is a professor of AI at the Vrije Universiteit in Amsterdam, that's the Free University in Amsterdam. He's also the PI of this big program called the Hybrid Intelligence Center, which spans seven Dutch universities, multimillion euro grant over 10 years. Welcome, Frank. Tell the folks a little bit more about what you're up to these days? Frank: All right. This Hybrid Intelligence Center occupies me most of the time, and that's been a very exciting ride over the past five years. We're just at the midpoint and we have five more years to go. Larry: Nice. How is it going? Are you satisfied? Are the expectations of the grantors being met and are you happy with the progress you're making? Frank: Yes. It's obvious to say that the world of AI is super dynamic now. All kinds of things have happened in the past few years in AI that nobody had predicted when we started, the rise of large language models of conversational AI. That has also really affected the notion of hybrid intelligence. It's been an even more exciting ride than we had expected. Larry: Yeah. That's right. Yeah. I think excitement is the word of the day. Hey, one thing I have to observe, earlier today before we recorded this, I was doing a presentation with some information architects, and the subject I was talking about hybrid AI architectures and neuro-symbolic loops and all this stuff. One of the people in the presentation asked, "Hey, what about human AI? Shouldn't that be the architecture?" Then I said, "You're going to love my next podcast guest," because that's the whole point of this hybrid intelligence idea, right? Frank: Yeah. The core idea of hybrid intelligence, hybrid standing for hybrid combination of human and machine intelligence. Think of hybrid teams, where a hybrid team is made up a bunch of people and a bunch of AIs who collaborate to get a task done. That, if you want, the tagline of the Hybrid Intelligence Center is that we're working on AI that collaborates with people instead of replacing them. If you work on AI systems that collaborate with people, then you certainly need to solve all kinds of different problems and answer all kinds of different questions than where you are thinking about AI in the replacement mode. Larry: Yeah. That seems to be, like in a lot of circles, there's this assumption that AI is just here to replace people, but you've been... Long before that was a meme and people talking about it, you were working on this hybrid concept. Has that heightened the urgency around your work, the current state of AI expectations? Frank: It has heightened the urgency, and it has also opened all kinds of doors. One of the big hurdles in AI-human collaboration, say five years ago, was really the conversational interface. It was hard to talk to AI systems, and they certainly wouldn't talk back to you in a coherent way. Well, we all know that's now a solved problem. But what happens in the middle is the real challenge. We don't think that the large language models are going to solve all of the collaboration between humans and AI systems. We want our AI systems to do things that the language models are not very good at, but we're using that technique in a kind of sandwich model. Now, the language model does the conversation on the front end, it does the conversation on the back end, and we're working on the AI agents, the smart that's in the middle, to create these hybrid teams. Larry: As you say that I'm thinking about that's just one aspect of the hybridization of this. That that's one way that humans... When you think about hybrid architectures, where LLMs can help build knowledge graphs and they can also fill in knowledge gaps in LLM architectures. What other obvious complimentary things are there between... What do humans need help with and what do machines need help with? Frank: Right. There are some obvious things like the perfect memory that machines have and the imperfect memory that we have. Okay? That's a nice example of where members in the team can really compensate for each other's strengths and weaknesses. Humans suffer from a whole host of these cognitive biases. For example, we suffer from the recency effect. We believe information more if we've heard it recently rather than when we've heard it in the past. We believe information more when we've heard it more frequently rather than... There's no reason to believe something more if you hear it more often. Frank: That doesn't make it more true, but it's how our brain works. Not always such a good idea. Computers can help us to compensate for all of these cognitive limitations. Conversely, we are very much aware of the context in which we operate. We are aware why we are doing something. We are aware of the implicit norms and values that govern the task that we're doing, that we're expected to obey in a particular group to perform a particular task. Computers don't have any sense of why they are doing something, the context in which they're doing it, the social and ethical norms under which they should operate. That's something where the human component can compensate for the machine limitations. These are just a few examples of that complementarity. Larry: Yeah. That's one of the things I think about a lot is that what we call in my world stakeholder alignment or stakeholder discovery or working with subject matter experts to make explicit their tacit knowledge in their head and things like that. It seems like that's probably always or mostly going to be a human capability. Is that... You probably have research that backs this up, right? Frank: Well, and if you want to collaborate with a computer, then you better make sure that there is some alignment between you and the computer. We can only collaborate because we share some of the way we look
-
28
Denny Vrandečić: Connecting the World’s Knowledge with Abstract Wikipedia – Episode 32
Denny Vrandečić As the founder of Wikidata, Denny Vrandečić has thought a lot about how to better connect the world's knowledge. His current project is Abstract Wikipedia, an initiative that aims to let anyone anywhere on the planet contribute to, and benefit from, the world's collective knowledge, in their native language. It's an ambitious goal, but - inspired by the success of other contributor-driven Wikimedia Foundation projects - Denny is confident that community can make it happen We talked about: his work as Head of Special Projects at the Wikimedia Foundation and his current projects: Wikifunctions and Abstract Wikipedia the origin story of his first project at Wikimedia - Wikidata a precursor project that informed Wikidata - Semantic MediaWiki the resounding success of the Wikidata project, the most edited wiki in the world, with half a million contributors how the need for more expressivity than Wikidata offers led to the idea for Abstract Wikipedia an overview of the Abstract Wikipedia project the abstract language-independent notation that underlies Abstract Wikipedia how Abstract Wikipedia will permit almost instant updating of Wikipedia pages with the facts it provides the capability of Abstract Wikipedia to permit both editing and use of knowledge in an author's native language their exploration of using LLMs to use natural language to create structured representations of knowledge how the design of Abstract Wikipedia encourages and facilitates contributions to the project the Wikifunctions project, a necessary precondition to Abstract Wikipedia the role of Wikidata as the Rosetta Stone of the web some background on the Wikifunctions project the community outreach work that Wikimedia Foundation does and the role of the community in the development of Abstract Wikipedia and Wikifunctions the technical foundations for his how to contribute to Wikimedia Foundation projects his goal to remove language barriers to allow all people to work together in a shared knowledge space a reminder that Tim Berners-Lee's original web browser included an editing function Denny's bio Denny Vrandečić is Head of Special Projects at the Wikimedia Foundation, leading the development of Wikifunctions and Abstract Wikipedia. He is the founder of Wikidata, co-creator of Semantic MediaWiki, and former elected member of the Wikimedia Foundation Board of Trustees. He worked for Google on the Google Knowledge Graph. He has a PhD in Semantic Web and Knowledge Representation from the Karlsruhe Institute of Technology. Connect with Denny online user Denny at Wikimedia Wikidata profile Mastodon LinkedIn email: denny at wikimedia dot org Resources mentioned in this interview Wikimedia Foundation Wikidata Semantic MediaWiki Wikidata: The Making Of Wikifunctions Abstract Wikipedia Meta-Wiki Video Here’s the video version of our conversation: https://youtu.be/iB6luu0w_Jk Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 32. The original plan for the World Wide Web was that it would be a two-way street, with opportunities to both discover and share knowledge. That promise was lost early on - and then restored a few years later when Wikipedia added an "edit" button to the internet. Denny Vrandečić is working to make that edit function even more powerful with Abstract Wikipedia, an innovative platform that lets web citizens both create and consume the world's knowledge, in their own language. Interview transcript Larry: Hi, everyone. Welcome to episode number 32 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Denny Vrandecic. Denny is best known as the founder of Wikidata, which we'll talk about more in just a minute. He's currently the Head of Special Projects at the Wikimedia Foundation. He's also a visiting professor at King's College London. So welcome, Denny. Tell the folks a little bit more about what you're up to these days. Denny: Thank you so much for having me, Larry. It's really a pleasure and honor. I enjoy listening to your podcast a lot, and I'm very happy to be here too. So these days I'm with the Wikimedia Foundation and, as I said, called Head of Special Projects. There are working on two new projects, one called Wikifunctions and Abstract Wikipedia, which are really very much tied together, and we'll get to those both in a moment, I think. Larry: Yeah, I'm really excited about both projects. I can't wait to get to them, but let's talk a little bit about Wikidata first because you started that 2012, is that correct? Denny: That's right, yes. Larry: What was the impetus for that? What motivated you to start that project? Denny: Well, this goes actually back to 2005. Markus Krötzsch and I were PhD students in Karlsruhe and Wikimania was coming to Frankfurt, which is really close to Karlsruhe. It was the very first Wikimania at all. We were both Wikipedians and we wanted to go there and we thought, "What could we do?" And so we connected our research topic, which was the Semantic Web, with Wikipedia and made a proposal there. We didn't really think it would go anywhere. We were just like, "This would be really cool if this happened." Denny: But over the next few years, there was so much interest in that people actually started implementing our ideas. We picked up on that. Semantic MediaWiki came out of it. And eventually when I was finishing my PhD, I was asked by Mark Reeves, who was working for Paul Allen's Vulcan back then, he was asking if I would like to make this happen for real. And so we approached the Wikimedia Foundation, we approached Wikimedia movement, and there was great excitement about it. We got the funding aligned and then we started working on Wikidata. This was really a dream come true basically for us who've been working on this idea of bringing structured data and Wikipedia together for more than seven years at that point. Larry: That's so interesting because my interest, I mean obviously it goes back aways, but my history of this kind of picks up with Wikidata, so that prehistory of it, connecting Wikipedia to the Semantic Web, which is obviously you're going to end up with something like Wikidata. And you were backed by the Vulcan Foundation or by Paul Allen's foundation? Denny: Yes. Larry: I did not know that. I lived in Seattle for a long time, so I walked by their building a lot. Well, that's really fascinating. So from 2005 to 2012, was it like simmering or were you doing things like precursors to the launch of Wikidata? Denny: We were doing precursive work. We were developing an extension called Semantic MediaWiki, which it is still quite widely used. There are two conferences per year about Semantic MediaWiki users. NASA is using it, for example, on the ISS. Microsoft and many others were also using or are still using it, which is actually integrating structured data into a MediaWiki installation and allowing everyone to build small knowledge graphs to query it and so on. Denny: For Wikidata, we took a lot of those lessons. We knew that we needed a little bit different data models. We started actually a different software project where we didn't build it on Semantic MediaWiki but rather something even more structured. Semantic MediaWiki is really good if you want to interleave the text together with the structured data, with the annotations. Whereas, Wikidata really builds a pure knowledge graph, items connecting it and giving it values and so on. But originally we were thinking, "Oh, we'll just switch on Semantic MediaWiki on the Wikipedias. I'm very glad we didn't do that. Denny: Actually, just recently, the 10-year anniversary of Wikidata was coming up recently. It's also already three years ago. Markus and Lydia Pintscher, who's the product lead for Wikidata at Wikimedia Deutschland, and I wrote a paper about the history of Wikidata. We were actually going into detail about these topics and how Wikidata came around. Larry: Oh, I'll have to link to that paper in the show notes. I'd love to read it. Well, then that's interesting. And then so Wikidata, that was sort of the original... not original, but it was one of the first realizations of the promise of the Semantic Web, and it continues to be in the sense of the unique identifiers and entity resolution and things like that. I assume you consider it a success. It seems like it's such an important part of the knowledge part of the internet. Denny: If you ask me, yes, definitely, Wikidata is a resounding success, obviously. It's certainly a much bigger success than we expected. More than half a million people have contributed to Wikidata. If you had asked me in 2010/2011 what the number of people will be who will contribute to such a project, I would be off by more than 10X. I would never have assumed that half a million people would actually contribute to such a project. So I'm really happy. Wikidata is now the most edited wiki in the world by far, even beating English Wikipedia. It is also just very large, very comprehensive. I'm more than excited about how it has developed, and I'm very happy to see how really I was continuing to work after I left Wikidata. Larry: Nice. For all of its success though, you see more that could be done in this area, right? Is that where your current projects come from? Denny: Yes, absolutely. So Wikidata is a classical knowledge graph. Actually, we went beyond the classical data model already in Wikidata, right? So we are not just like triples, it's not just subject predicate object, we also introduced the ability to have qualifiers on each of those statements. We introduced the ability to have references for every statement and so on. So there was a number of things that we added. Denny: ...
-
27
Charles Ivie: The Rousing Success of the Semantic Web “Failure” – Episode 31
Charles Ivie Since the semantic web was introduced almost 25 years ago, many have dismissed it as a failure. Charles Ivie shows that the RDF standard and the knowledge-representation technology built on it have actually been quite successful. More than half of the world's web pages now share semantic annotations and the widespread adoption of knowledge graphs in enterprises and media companies is only growing as enterprise AI architectures mature. We talked about: his long work history in the knowledge graph world his observation that the semantic web is "the most catastrophically successful thing which people have called a failure" some of the measures of the success of the semantic web: ubiquitous RDF annotations in web pages, numerous knowledge graph deployments in big enterprises and media companies, etc. the long history of knowledge representation the role of RDF as a Rosetta Stone between human knowledge and computing capabilities how the abstraction that RDF permits helps connect different views of knowledge within a domain the need to scope any ontology in a specific domain the role of upper ontologies his transition from computer science and software engineering to semantic web technologies the fundamental role of knowledge representation tech - to help humans communicate information, to innovate, and to solve problems how semantic modeling's focus on humans working things out leads to better solutions than tech-driven approaches his desire to start a conversation around the fundamental upper principles of ontology design and semantic modeling, and his hypothesis that it might look something like a network of taxonomies Charles' bio Charles Ivie is a Senior Graph Architect with the Amazon Neptune team at Amazon Web Services (AWS). With over 15 years of experience in the knowledge graph community, he has been instrumental in designing, leading, and implementing graph solutions across various industries. Connect with Charles online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/1ANaFs-4hE4 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 31. Since the concept of the semantic web was introduced almost 25 years ago, many have dismissed it as a failure. Charles Ivie points out that it's actually been a rousing success. From the ubiquitous presence of RDF annotations in web pages to the mass adoption of knowledge graphs in enterprises and media companies, the semantic web has been here all along and only continues to grow as more companies discover the benefits of knowledge-representation technology. Interview transcript Larry: Hi everyone. Welcome to episode number 31 of the Knowledge Graph Insights Podcast. I am really happy today to welcome to the show Charles Ivie. Charles is currently a senior graph architect at Amazon's Neptune department. He's been in the graph community for years, worked at the BBC, ran his own consultancies, worked at places like The Telegraph and The Financial Times and places you've heard of. So welcome Charles. Tell the folks a little bit more about what you're up to these days. Charles: Sure. Thanks. Thanks, Larry. Very grateful to be invited on, so thank you for that. And what have I been up to? Yeah, I've been about in the graph industry for about 14 years or something like that now. And these days I am working with the Amazon Neptune team doing everything I can to help people become more successful with their graph implementations and with their projects. And I like to talk at conferences and join things like this and write as much as I can. And occasionally they let me loose on some code too. So that's kind of what I'm up to these days. Larry: Nice. Because you have a background as a software engineer and we will talk more about that later because I think that's really relevant to a lot of what we'll talk about. But the reason I wanted to have you on, I caught a video, we met somewhere recently, but anyhow, I was watching a video with the OriginTrail folks, and you made this great quote in there. Somebody asked about the Semantic Web and just kind of offhandedly dismissed it like people always do when they talk about it. And you said, "Yeah, but that's the most catastrophically successful thing which people have called a failure." Can you elaborate on that a little bit? Charles: Yes. I think what it really boils down to with that is what ... Well, how do you classify success or failure? These are actually very incredibly abstract terms. And what I was referring to originally was a bit of data, a statistic that over 50% of web pages that exist contain RDF. And what does that mean? That means that there are statements that are written in RDF syntax in over half web pages. And that didn't sound particularly unsuccessful to me, and I'd struggled with these kind of statements in my mind for a while, such as why is RDF a failure? You hear this kind of thing go around. And I thought it didn't really feel like it to me, it never really felt like it was particularly a failure. I mean, I built two businesses from it and they weren't failures and we represented an awful lot of knowledge in that time and it felt like that knowledge was represented. So yeah, how do you quantify successful failure? Who's taking the measurements and who's making that argument has a lot to do with that. Larry: And the other thing that I shared that ... I was doing a presentation shortly after I saw that talk, and I grabbed that stat because I think a year ago or something, Tony Seale pointed out to something like 40 some percent of websites had RDF annotations at that point. And I was like, wow. So it's still growing too. So there's that sort of growth of it. But immediately I followed that with a slide, which I noticed when I went back and watched another presentation of yours that you had done a similar slide where you just kind of list the enterprises that have done stuff with RDF, like RDF graphs, like your slide listed Facebook and Amazon and Uber and Siemens and Google. And I had included in mine, like LinkedIn's economic graph and Netflix's graph and JPMorganChase and Credit Suisse and Bloomberg and all these others. So it feels like a pretty successful effort in that regard too. Charles: Yeah, exactly. In many ways it's those with the deepest pockets that have benefited the most from the technologies, which is a statement which you hear from time to time in the modern world, of course. That's not that surprising. It's those who can dedicate their time on maybe understanding a larger data landscape, seeing the value of it and joining it all together and representing it properly. Is it really any surprise that they're the ones that have got the most value out of it? No, it's not, basically. Yeah, I mean- Larry: Well, they have both the problem and the means to implement a solution. That's kind of the idea. Charles: Yeah. Right. Exactly. And if they find that they don't like whatever tooling they find that can support what they're doing, they have the capability to build things themselves or join things together and buy things that maybe others couldn't afford and this sort of thing. So especially in the early days when things are very much cutting edge and new technology sets, you don't necessarily have big swaths of open source technology stacks available to you and stuff like that. These things are growing all the time, of course. But if we are speaking about it from a success is count metric of implementation nodes sense, then those things will have a big bearing on that. But I'm not even necessarily sure that that's the right metric to quantify success for. Larry: Because we were talking before we went on the air too, about what is the right metric because you said that, okay, which portion of the data in the world is represented as RDF in a graph someplace. That's pretty low. It's still mostly in SQL databases, but if the criterion was like, how do we understand all that data, you can make a case for RDF. Is that right? Charles: Yeah, I think that's a very good point. I don't even think we necessarily talked about it in exactly the way that we're about to talk about it, I think. But you are right. I mean, okay, so if what we are going to say is that the success of this is measured on how much data do we understand, that could be understood with no previous understanding apart from how to follow ontological models, then RDF is a massive success because this is the only one that really works. So on that very metric, all the others have a kind of success rate of zero to a certain extent because there was never really any overriding standardized ontological modeling concept which filtered down into it. Charles: Now, you could argue that maybe for example, entity relationship diagrams tried to replicate that sort of thing. So creating conceptual things which are related to other conceptual things, but they were always really created in a form that was supposed to be implemented into maybe a relational database or something like that. They tended to be bound to technology to a certain extent. So if that was an interesting metric, then maybe RDF is vastly successful because it could represent stuff in a way that's really well understood immediately. Yeah. Larry: The way you just said that too reminds me that the notion of a technology being bound to the success. You could even argue that RDF just happens to be a thing that works now, but that's not necessarily the only way you could represent knowledge. Have you given much thought to other ways or why RDF is successful in that regard? Charles: Yeah, sure. We've been recording knowledge for a very, very long time. And I mean that as in we as in people. First of all, people were recording knowledge by basically telling stories to one another and saying things,...
-
26
Andrea Gioia: Human-Centered Modeling for Data Products – Episode 30
Andrea Gioia In recent years, data products have emerged as a solution to the enterprise problem of siloed data and knowledge. Andrea Gioia helps his clients build composable, reusable data products so they can capitalize on the value in their data assets. Built around collaboratively developed ontologies, these data products evolve into something that might also be called a knowledge product. We talked about: his work as CTO at Quantyca, a data and metadata management consultancy his description of data products and their lifecycle how the lack of reusability in most data products inspired his current approach to modular, composable data products - and brought him into the world of ontology how focusing on specific data assets facilitates the creation of reusable data products his take on the role of data as a valuable enterprise asset how he accounts for technical metadata and conceptual metadata in his modeling work his preference for a federated model in the development of enterprise ontologies the evolution of his data architecture thinking from a central-governance model to a federated model the importance of including the right variety business stakeholders in the design of the ontology for a knowledge product his observation that semantic model is mostly about people, and working with them to come to agreements about how they each see their domain Andrea's bio Andrea Gioia is a Partner and CTO at Quantyca, a consulting company specializing in data management. He is also a co-founder of blindata.io, a SaaS platform focused on data governance and compliance. With over two decades of experience in the field, Andrea has led cross-functional teams in the successful execution of complex data projects across diverse market sectors, ranging from banking and utilities to retail and industry. In his current role as CTO at Quantyca, Andrea primarily focuses on advisory, helping clients define and execute their data strategy with a strong emphasis on organizational and change management issues. Actively involved in the data community, Andrea is a regular speaker, writer, and author of 'Managing Data as a Product'. Currently, he is the main organizer of the Data Engineering Italian Meetup and leads the Open Data Mesh Initiative. Within this initiative, Andrea has published the data product descriptor open specification and is guiding the development of the open-source ODM Platform to support the automation of the data product lifecycle. Andrea is an active member of DAMA and, since 2023, has been part of the scientific committee of the DAMA Italian Chapter. Connect with Andrea online LinkedIn (#TheDataJoy) Github Video Here’s the video version of our conversation: https://www.youtube.com/watch?v=g34K_kJGZMc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 30. In the world of enterprise architectures, data products are emerging as a solution to the problem of siloed data and knowledge. As a data and metadata management consultant, Andrea Gioia helps his clients realize the value in their data assets by assembling them into composable, reusable data products. Built around collaboratively developed ontologies, these data products evolve into something that might also be called a knowledge product. Interview transcript Larry: Hi, everyone. Welcome to episode number 30 of the Knowledge Graph Insights podcast. I'm really happy today to welcome to the show Andrea Gioia. Andrea's, he does a lot of stuff. He's a busy guy. He's a partner and the chief technical officer at Quantyca, a consulting firm that works on data and metadata management. He's the founder of Blindata, a SaaS product that goes with his consultancy. I let him talk a little bit more about that. He's the author of the book Managing Data as a Product, and he's also, he comes out of the data heritage but he's now one of these knowledge people like us. So welcome, Andrea. Tell the folks a little bit more about what you're up to these days. Andrea: Thank you. Thank you very much, Larry, for having me. It's a pleasure. Yes, as a CTO in Quantyca, I'm in charge of all our advisory services. So I'm helping a customer in figure out how to manage their data properly, especially to leverage the potential of artificial intelligence. So basically I see all sort of problem in data management. Each client, it's different, but each client have a lot of problem of data that is very fragmented or too complex to manage. And so it's a very complex problem to feed this data to the AI model and extract the potential that the modern AI and all the breakthroughs that we are seeing in this day made available. So I'm really focused at this moment to help customers, especially in find a way to manage their knowledge, the knowledge that is characteristic of the company, that is a differentiator of the companies, the knowledge that is not known at the large language mode, what make the company different and can be leveraged to implement domain-specific, company-specific use case based on AI and leveraging the data collected. Larry: Yeah. As you mentioned that, we were just chatting a bit before we went on about the scope of the conversation. And I totally forgot to mention AI, which is of course is like the main driver for half of this stuff we're doing nowadays. But a couple of things you mentioned there. I want to go back to, one, you mentioned what a complex problem space this is and the challenges of data management and every organization has its own issues there. One of the ways that folks like you have helped people cope with this is the notion of a data product. And I know that's a newish concept and maybe new to some of the listeners to this. Can you talk a little bit about your conception of what a data product is and how you put one together? Andrea: Yeah, absolutely. The concept is new but the rationale behind it, it's not new. Humans, when a problem is too much complex, the only way that humans have found to solve a very complex problem is to split in a different part, in smaller part, and try to take all the complexity within each single part. And the idea of that a product come from this strategy, done the lead, the team part. So the idea is not managing the data in a unique central platform in which all the data of the company is collected but split in a modular architecture. So the platform is still there. You have the data layer, the data warehouse, whatever. It's the architecture that you prefer, but it's not anymore a monolithic solution in which you store all the data that you have in your company, but it's built as a composition of independent modules. Each module focuses on one or more, but usually one specific data asset, and there is a team that is in charge of manage the life cycle of that data product that manages specific data asset. Andrea: Of course the composition of all the data asset create the platform and the platform can be used to support the different use cases, but basically you can work on each single module without caring too much about the other module because each module is isolated with a specific interface. So if you do not modify the interface, you can modify the technology and implementation inside. And if you want to understand how the different modules connect, you can ignore the implementation and just concentrate on the relation between the different interfaces. Andrea: So to make it very, very simple, we can think at the data product like a sort of microservice that is a software application, is actually a software application, that does not expose functionality, transactional functionality, to acquire data and drive the transaction but expose the data. It's a software application that expose the data in order to make the data it manages as much usable as possible for its customer base, for its users. So this is a data product. And of course because it is a product, it is managed with a product mindset. So it's not a project. It's not something that the team develop and then forget about it. But there is a dedicated team that implement the first version and then evolve the software application that support that specific data asset through all its lifecycle till the retirement when the data asset is not anymore relevant for the company. That's pretty much what is a data product for me. Andrea: So basically I call this kind of data product a pure data product to even more underline the fact that it's a software application that expose data because I also have a lot of time the question, a report, a dashboard is a data product and they say, yes, it's a data product if it is managed as a product with a product mindset. But my book, my research is more focused on the pure data product, so that specific kind of data products that do not expose visualization or insight or action but expose just pure data to make it reusable and composable over time to support multiple use cases. Larry: That's right. We didn't talk about this before we went on the air, but the episode right before this one is with Dave McComb, and I know I've heard you talk before about you appreciate his approach and his data-centricity. And everything you just said, I'm like, "Oh yeah, he's read Dave's books." Was that the major influence, or what are the influences? Andrea: Absolutely. It was for me an epiphany because at that time when I read McComb's books, I was looking for... I had a problem because we had started since couple of years to help our customer and created this kind of modular architecture. So that architecture that is built as a composition of different data product, even managed with a distributed operating model. So all the data product are managed by different business domain in an autonomous way. And we have seen that this improve a lot the quality of the data and reduce the maintenance cost because of cour
-
25
Dave McComb: Semantic Modeling for the Data-Centric Enterprise – Episode 29
Dave McComb During the course of his 25-year consulting career, Dave McComb has discovered both a foundational problem in enterprise architectures and the solution to it. The problem lies in application-focused software engineering that results in an inefficient explosion of redundant solutions that draw on overlapping data sources. The solution that Dave has introduced is a data-centric architecture approach that treats data like the precious business asset that it is. We talked about: his work as the CEO of Semantic Arts, a prominent semantic technology and knowledge graph consultancy based in the US the application-centric quagmire that most modern enterprises find themselves trapped in data centricity, the antidote to application centricity his early work in semantic modeling how the discovery of the "core model" in an enterprise facilitates modeling and building data-centric enterprise systems the importance of "baby step" approaches and working with actual customer data in enterprise data projects how building to "enduring business themes" rather than to the needs of individual applications creates a more solid foundation for enterprise architectures his current interest in developing a semantic model for the accounting field, drawing on his history in the field and on Semantic Arts' gist upper ontology the importance of the concept of a "commitment" in an accounting model how his approach to financial modeling permits near-real-time reporting his Data-Centric Architecture Forum, a practitioner-focused event held each June in Ft. Collins, Colorado Dave's bio Dave McComb is the CEO of Semantic Arts. In 2000 he co-founded Semantic Arts with the aim of bringing semantic technology to Enterprises. From 2000- 2010 Semantic Arts focused on ways to improve enterprise architecture through ontology modeling and design. Around 2010 Semantic Arts began helping clients more directly with implementation, which led to the use of Knowledge Graphs in Enterprises. Semantic Arts has conducted over 100 successful projects with a number of well know firms including Morgan Stanley, Electronic Arts, Amgen, Standard & Poors, Schneider-Electric, MD Anderson, the International Monetary Fund, Procter & Gamble, Goldman Sachs as well as a number of government agencies. Dave is the author of Semantics in Business Systems (2003), which made the case for using Semantics to improve the design of information systems, Software Wasteland (2018) which points out how application-centric thinking has led to the deplorable state of enterprise systems and The Data-Centric Revolution (2019) which outlines a alternative to the application-centric quagmire. Prior to founding Semantic Arts he was VP of Engineering for Velocity Healthcare, a dot com startup that pioneered the model driven approach to software development. He was granted three patents on the architecture developed at Velocity. Prior to that he was with a small consulting firm: First Principles Consulting. Prior to that he was part of the problem. Connect with Dave online LinkedIn email: mccomb at semanticarts dot com Semantic Arts Resources mentioned in this interview Dave's books: The Data-Centric Revolution: Restoring Sanity to Enterprise Information Systems Software Wasteland: How the Application-Centric Quagmire is Hobbling Our Enterprises Semantics in Business Systems: The Savvy Manager's Guide gist ontology Data-Centric Architecture Forum Video Here’s the video version of our conversation: https://youtu.be/X_hZG7cFOCE Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 29. Every modern enterprise wrestles with its data, trying to get the most out of it. The smartest businesses have figured out that it isn't just "the new oil" - data is the very bedrock of their enterprise architecture. For the past 25 years, Dave McComb has helped companies understand their data, discovering along the way the importance of adopting a data-centric mindset that reveals the essential nature and the true value of this precious asset. Interview transcript Larry: Hi, everyone. Welcome to episode number 29 of the Knowledge Graph Insights Podcast. I am really happy today to welcome to the show Dave McComb. Dave, I think it's safe to say he's a legend in the ontology and knowledge graph worlds. He's the author of three books. One early book called Semantics in Business: The Savvy Manager's Guide, which was probably ahead of its time, which is fine. Dave's that kind of guy. He also wrote the books Software Wasteland and The Data-Centric Revolution, which set the problem that we have in current enterprise architectures and the proposed a solution. Those, by the way, are both under revision. By the end of 2025 or so, we should see new editions of those. Welcome, Dave. Tell the folks a little bit more about what you're up to these days. Dave: Great. Thanks, Larry. Well, we're still running a company here. We have Semantic Arts, probably it's about 30 employees. 20 ontologists and five semantic developers doing God's work, making companies more data-centric. That's what we do now. We go into companies, mostly medium to large-sized companies, and help them. Dave: What we've done since the publishing of the book, we started doing it around the publishing of the book, is just figuring out a methodological, and fairly safe and incremental way to get there. Because I think a lot of companies are burned out from so-called legacy modernization projects and digital transformation projects that didn't go well. There's a lot of scar tissue there. We've figured out a way to, first, move some of your data, get it into the graph, get you used to it. Then start moving more, and then more functionality, and just gradually get people there. Larry: That's the classic smart consultant, baby steps, proofs of concept. Dave: Yeah. Larry: Small work out there. Dave: Yeah. Larry: Hey, let's back up a little bit and talk about, because I'm going to guess that many if not most of my listeners are familiar with you. But for those who aren't, can you talk a little bit about the philosophy? Because you've got a well-articulated philosophy set out in two books about the problem this application-centric quagmire that enterprises got themselves into, and then the data-centric way. Can you talk a little bit about the ... I'd love to know where the idea occurred to you, how you identified the problem, and then a little bit about the two books. Dave: Yeah. I started my career with Arthur Anderson, the accounting firm, but they had a consulting division which originally was just called the Administrative Services Division. How innocuous. We worked with the accountants a lot. Then it, as we know, eventually grew into the consulting division, which was Arthur Anderson Consulting, which is now Accenture. They grew like crazy. But back in those early days, we built and implemented mostly accounting systems. I had a career of going around the world, implementing, often building from scratch because it was early days, accounting systems. Including two pretty major full-function ERP systems built from the ground up, and one of them in multi-currency. Pretty sophisticated. Dave: It's two things I thought I knew at the time. One, I thought I knew accounting and accounting systems. And I thought I knew the right path for building enterprise applications. But then, right as I was leaving and then as I was doing some independent work on the side, I started to see what was actually going on. That companies were just implementing system, after system, after system. You'd go into a client and they'd have a dozen inventory control systems. You'd go, "Wow, not only do you have a dozen of them." By the way, I'm going to update that and I'll give you some metrics about how many systems most of our clients currently have. Dave: What bothered me more was they're all completely arbitrarily different. Not only did every single one of them, which had hundreds or thousands of tables, and each table had dozens of columns, and every one of them had some totally made up name. Some of them, German acronyms, all kinds of stuff. They were even structured differently. You'd go, "Wow. What would cause several different smart people to design an inventory control system and have them come out that different?" We studied database design, third normal form, and all that. If you'd laid out the problem exactly the same, you'd do third normal form, and you'd get to the same answer, but they were not starting from the same place. Then you go, "Wow. Why not? What's going on here?" This is the early '90s. Dave: This is back before the World Wide Web, you would have to go to the library to do research. And so we'd go to library, and find magazine articles, and photocopy them, and all those. I know I still have a three-ring binder. There were four articles at that time about applying semantics to information systems. We had devoured, I think, everything that was known at the time. Now, of course, if we did a Google search, there probably was other stuff that we didn't find. We invented this thing we called semantic modeling and said maybe, if you started from what things really mean, you'd actually figure out that inventory actually really means widgets and bins, whatever it is. But you'd hopefully start from the same place and end in the same place. Dave: Yeah, that was my observation and how we got into this. A few minutes ago, I'd promised I'd come up with some metric. As we've been going from client to client now, and this is not an exact metric but it's close enough to be scary, take the number of employees you have in your company and divide it by 10, that's probably about how many applications you're currently managing. Larry: Wow. Dave: Think about it....
-
24
Ole Olesen-Bagneux: Understanding Enterprise Metadata with the Meta Grid – Episode 28
Ole Olesen-Bagneux In every enterprise, says Ole Olesen-Bagneux, the information you need to understand your organization's metadata is already there. It just needs to be discovered and documented. Ole's Meta Grid can be as simple as a shared, curated collection of documents, diagrams, and data but might also be expressed as a knowledge graph. Ole appreciates "North Star" architectures like microservices and the Data Mesh but presents the Meta Grid as a simpler way to manage enterprise metadata. We talked about: his work as Chief Evangelist at Actian his forthcoming book, "Fundamentals of Metadata Management" how he defines his Meta Grid: an integration architecture that connects metadata across metadata repositories his definition of metadata and its key characteristic, that it's always in two places at once how the Meta Grid compares with microservices architectures and organizing concepts like Data Mesh the nature of the Meta Grid as a small, simple, and slow architecture which is not technically difficult to achieve his assertion that you can't build a Meta Grid because it already exists in every organization the elements of the Meta Grid: documents, diagrams or pictures, and examples of data how knowledge graphs fit into the Meta Grid his appreciation for "North Star" architectures like Data Mesh but also how he sees the Meta Grid as a more pragmatic approach to enterprise metadata management the evolution of his new book from a knowledge graph book to his elaboration on the "slow" nature of the Meta Grid, in particular how its metadata focus contrasts with faster real-time systems like ERPs the shape of the team topology that makes Meta Grid work Ole's bio Ole Olesen-Bagneux is a globally recognized thought leader in metadata management and enterprise data architecture. As VP, Chief Evangelist at Actian, he drives industry awareness and adoption of modern approaches to data intelligence, drawing on his extensive expertise in data management, metadata, data catalogs, and decentralized architectures. An accomplished author, Ole has written The Enterprise Data Catalog (O’Reilly, 2023). He is currently working on Fundamentals of Metadata Management (O’Reilly, 2025), introducing a novel metadata architecture known as the Meta Grid. With a PhD in Library and Information Science from the University of Copenhagen, his unique perspective bridges traditional information science with modern data management. Before joining Actian, Ole served as Chief Evangelist at Zeenea, where he played a key role in shaping and communicating the company’s technology vision. His industry experience includes leadership roles in enterprise architecture and data strategy at major pharmaceutical companies like Novo Nordisk.Ole is passionate about scalable metadata architectures, knowledge graphs, and enabling organizations to make data truly discoverable and usable. Connect with Ole online LinkedIn Substack Medium Resources mentioned in this interview Fundamentals of Metadata Management, Ole's forthcoming book Data Management at Scale by Piethein Strengholt Fundamentals of Data Engineering by Joe Reis and Matt Housley Meta Grid as a Team Topology, Substack article Stewart Brand's Pace Layers Video Here’s the video version of our conversation: https://youtu.be/t01IZoegKRI Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 28. Every modern enterprise wrestles with the scale, the complexity, and the urgency of understanding their data and metadata. So, by necessity, comprehensive architectural approaches like microservices and the data mesh are complex, big, and fast. Ole Olesen-Bagneux proposes a simple, small, and slow way for enterprises to cultivate a shared understanding of their enterprise knowledge, a decentralized approach to metadata strategy that he calls the Meta Grid. Interview transcript Larry: Hi, everyone. Welcome to episode number 28 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Ole Olesen-Bagneux. Ole is the... He's currently the chief evangelist at Actian. Welcome, Ole. Tell the folks a little bit more about your role at Actian and what you do there. Ole: Thank you, Larry. Thank you for having me on. This is a great topic that we will dive in today. First of all, I recently joined Actian, so it's been my first week since we've recorded this. I am the chief evangelist in Actian, so I will be telling the story of Actian as this technology evolves. Actian is a data platform, it's based in the US and it is part of the HCL software family. Actian is a data platform, and as this data platform changes and evolves over time, I'll be telling that story. That's really what I'll be doing. Larry: What a fun job, evangelizing a technology platform. Yeah. I love that kind of work. You're also working on a book called Fundamentals of Metadata Management, which is all about the Meta Grid concept that you came up with, and that's what I really want to talk about today. How's the book going? It's due for publication this year, correct? Ole: That is correct. I finalized the first draft version of the manuscript a couple of weeks ago. I am very confident that I will publish it on time. Maybe a little earlier than the current publication date, that I don't know, but it's coming along nicely. It's been a difficult book to write. It's the sum of my experience as a leader and an enterprise architect working closely with all things data for the last 10, 15 years. 10 years. Larry: Nice. Sort of a encapsulation. Well, it's an intriguing idea, and I love the... We were talking before we went on the air about how much I love a good framework, and is that the right way to think about it? I guess, how would you define the Meta Grid? Ole: The Meta Grid, as I see it, is first and foremost an architecture, an integration architecture between distinct types of technologies that do not perform the value chain of a company such as ERP systems and CRM systems and what have you. Neither is it focused at the analytical plane, so to say, the data mesh universe where you're trying to discuss how centralized or decentralized should your analytical data technologies be at data warehouses, data lakes, and the like. The Meta Grid really is about these small, small technologies that depict the IT landscape, the information security management system, the data catalog, the endpoint management system, the knowledge management system, all these systems that for distinct purposes depict the IT landscape. A short definition of the Meta Grid is that it is an integration architecture that connects metadata across metadata repositories. Larry: Nice. That connects across repositories, and that's one of the key things about the Meta Grid is that your conception of metadata differs... Well, you come out of a library science background, isn't that right? Ole: That is right. Larry: I've learned most of what I know about metadata from my librarian friends, but I've also... I think I've fallen into the not trap or a different way of thinking about it than you do this notion of just thinking about the kinds of metadata, descriptive metadata, administrative, tech, all those kinds. You have a different take on metadata. I'd love to hear how you think of it? Ole: You bet. Yes. Well, first of all, let me emphasize that I don't think the way you think of metadata is wrong. That's quite important for me. But what I see lacking in that thinking, for example, in the Data Management Body of Knowledge, the DAMA-DMBOK or other resources, vital resources in this field is that if you do that subcategorization of metadata into operational, technical, business, social, what have you, then you discuss subcategories instead of the actual nature of metadata. What I've come to find in my work life is that this perspective gives you some blind spots to something that is very, very important in metadata. How I define metadata is not in terms of its subcategories. I don't think you can do that actually for metadata. I try to find the very nature of metadata in my definition of it, and that is that it is not anything in particular, but that it is always in two places at once. Ole: That's the key to metadata. It'll be in a repository and it will be at source, just like a book has its information, its metadata in the book itself and on Amazon. Just like the data in a data catalog is listed in the data catalog and in the data sources of that particularly data source that you have scanned. Metadata is in two places at once, and that definition unfolds the idea of the Meta Grid because once you have established this definition, you can also look at this definition as a problem because what it creates, and you can see that at every single company in the world, is that you have a lot of different metadata repositories listing the same type of metadata. That's the problem that the Meta Grid addresses, that all these small activities in all these small teams are taking place in isolation. Larry: Is that the classic silo issue in enterprise architectures? Ole: It's definitely the classic silo issue in enterprise architecture just for a very distinct purpose. I feel very, very connected to everything, team topologies, everything data mesh, microservices. These ideas of thinking of how to break up monolithic thinking and monolithic technology practices is something that is very, very close to... Well, to my heart and my mind. Larry: That close to mine as well. I help organize a conference called Decoupled Days, it's all about microservices and decoupled architectures, but I love that you... But one of your... I read a bunch of stuff preparing for this, but you did this one blog post where you had a table that compares the microservices architectures, the data mesh,...
-
23
Andrea Volpini: The Role of Memory in Digital Branding for AI – Episode 27
Andrea Volpini Your organization's brand is what people say about you after you've left the room. It's the memories you create that determine how people think about you later. Andrea Volpini says that the same dynamic applies in marketing to AI systems. Modern brand managers, he argues, need to understand how both human and machine memory work and then use that knowledge to create digital memories that align with how AI systems understand the world. We talked about: his work as CEO at WordLift, a company that builds knowledge graphs to help companies automate SEO and other marketing activities a recent experiment he did during a talk at an AI conference that illustrates the ability of applications like Grok and ChatGPT to build and share information in real time the role of memory in marketing to current AI architectures his discovery of how the agentic approach he was taking to automating marketing tasks was actually creating valuable context for AI systems the mechanisms of memory in AI systems and an analogy to human short- and long-term memory the similarities he sees in how the human neocortex forms memories and how the knowledge about memory is represented in AI systems his practice of representing entities as both triples and vectors in his knowledge graph how he leverages his understanding of the differences in AI models in his work the different types of memory frameworks to account for in both the consumption and creation of AI systems: semantic, episodic, and procedural his new way of thinking about marketing: as a memory-creation process the shift in focus that he thinks marketers need to make, "creating good memories for AI in order to protect their brand values" Andrea's bio Andrea Volpini is the CEO of WordLift and co-founder of Insideout10. With 25 years of experience in semantic web technologies, SEO, and artificial intelligence, he specializes in marketing strategies. He is a regular speaker at international conferences, including SXSW, TNW Conference, BrightonSEO, The Knowledge Graph Conference, G50, Connected Data and AI Festival. Andrea has contributed to industry publications, including the Web Almanac by HTTP Archive. In 2013, he co-founded RedLink GmbH, a commercial spin-off focused on semantic content enrichment, natural language processing, and information extraction. Connect with Andrea online LinkedIn X Bluesky WordLift Video Here’s the video version of our conversation: https://youtu.be/do-Y7w47CZc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 27. Some experts describe the marketing concept of branding as, What people say about you after you’ve left the room. It's the memories they form of your company that define your brand. Andrea Volpini sees this same dynamic unfolding as companies turn their attention to AI. To build a memorable brand online, modern marketers need to understand how both human and machine memory work and then focus on creating memories that align with how AI systems understand the world. Interview transcript Larry: Hi, everyone. Welcome to episode number 27 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Andrea Volpini. Andrea is the CEO and the founder at WordLift, a company based in Rome. Tell the folks a little bit more about WordLift and what you're up to these days, Andrea. Andrea: Yep. So we build knowledge graphs and to help brands automate their SEO and marketing efforts using large language model and AI in general. Larry: Nice. Yeah, and you're pretty good at this. You've been doing this a while and you had a recent success story, I think that shows, that really highlights some of your current interests in your current work. Tell me about your talk in Milan and the little demonstration you did with that. Andrea: Yeah, yeah, so it was last week at AI Festival, which is a very large event with I would say hundreds of speakers. And my talk was about memory as a new framework for marketing in the age of AI assistant. And so I did a small test with the audience and I imagine we had a crowd of maybe, I don't know, 40, 60 people attending the talk and a few others online. And I had these slides where I challenged the audience to program the memory of Grok. Grok is X AI system. And I wanted to do this with Grok and ChatGPT by asking the audience to share feedback about my talks. The talks was ready towards the end. And so I asked, "Okay, just share openly on X and Facebook about how was this talk?" And then we set up a small poll on X to let people simply vote if it was good or bad or relevant or boring. Andrea: And so we created engagement over social and of course, particularly because I'm still one of the few left on X, we interacted on X. And then all of a sudden, maybe after just a few minutes, one of my colleague went on Grok and asked, "What are the best talks at AI festival 2025?" And you can imagine there are hundreds of speakers, but Grok responded, "One highlight is a CyberAndy presentation that talked about using memory with AI system, and one of the attendees described it as mesmerizing, suggesting that he explored neuroscience," and blah, blah, blah. So I was able to get there and to build memory collectively by having user share feedback on social network. And by the way, the same applied to ChatGPT. So asking the same to ChatGPT would also highlighted my talk versus many others, better talks on that day. Larry: That's really one of the common observations and criticisms of LLMs has been their inability to access real-time information. That you build the model and there it is. So there's obviously something going on under there. You're one of the first people I've talked to who talks a lot about memory in these architectures. I guess, maybe if you could, I mean there's so much going on in the last couple of years with this, but what have been the evolutions in the AI and LLM sphere that kind of have led to the emerging importance of memory in these architectures? Andrea: So I mean, I think all of us are realizing with daily use that we're not interacting with language models anymore, but we are interacting with more complex systems that take into account multiple pieces in order to provide an accurate response. And every system, whether we're dealing with Perplexity, ChatGPT, or Gemini, or Grok has its own different way of combining information in order to respond to us. Andrea: And so I started, because my work in marketing, I started to think how we should approach a customer that is becoming an AI. And then that was my trigger was like, okay, what if the next customer is not a human? What happens? And the first consideration to be made is that in the context of SEO, for example, we transition with after a few years from the idea of keywords and focusing on what are the keywords that I should rank for to focusing on the search intent of the user that makes a request to a search engine. But then all of this is gone, if I have to deal with ChatGPT, Deep Search, all of these disappear if I have to deal with something like Operator or Gemini Deep Research functionality because in the end there's not going to be a human that it's making the request, but it's going to be an agent. And so I started to think, okay, what is marketing then if keywords are gone and also search intents is gone, what is left? What influenced the systems? And then I got to the revelation of memory. Larry: Okay. That's really interesting. The way, that evolution you just described too. The one thing that occurred to me as you were talking about that is that ostensibly Google has always favored that if you're doing things that appeal to human beings, you'll rank better in the search engines. But it sounds like from what you're saying, and so that kind of guided SEO for the last, I don't know, 15, 20 years, but now you're saying we're in this, we've kind of switched to where, and so I think a lot of SEOs, the perception was they were just playing to Google to trying to game Google's algorithms. And it's not like gaming, but it's understanding your audience. It's like any old communication problem, understanding your audience. Larry: So what are you seeing as the difference as you make that leap from search intent to memory needs of these new like Deep Research and tools like that? How do you, and your end goal in this is to automate marketing tasks. What does that look like? What's the pipelines or procedures or your approach to that? Andrea: So I started from building our system for our client to let's say improve the quality of content recommendation on an e-commerce website or increasing the quality of internal links and doing that at scale required an agentic approach. So there is a language model driven agent that has to find relevant pages and then has to have the notion of what is a main query for these pages, and then as to learn how to craft a proper anchor text in order to link one page to a relevant other page. Andrea: So as I was doing this development, I realized that the essence wasn't really the model itself. That of course has its own characteristics and biases, but it was really the context that I was feeding the model with in real time in order for it to do the task. And so I realized that a pivotal change, it's on how we craft these memories. What is the information in context that we want to pass to the agent in order to do the task properly, and how does the system evolve as things move forward and user maybe start clicking on these links and search engines start crawling these pages. And so I realized that memory was really the underlying element of success for my AI agents. Larry: So memory, and when we think of memory, you think of RAM and the computer memory, but also human memory and the different kinds of memory like short-term, long-term....
-
22
Jacobus Geluk: Use-Case Trees for the Data-Product Marketplace – Episode 26
Jacobus Geluk The arrival of AI agents creates urgency around the need to guide and govern them. Drawing on his 15-year history in building reliable AI solutions for banks and other enterprises, Jacobus Geluk sees a standards-based data-product marketplace as the key to creating the thriving data economy that will enable AI agents to succeed at scale. Jacobus launched the effort to create the DPROD data-product description specification, creating the supply side of the data market. He's now forming a working group to document the demand side, a "use-case tree" specification to articulate the business needs that data products address. We talked about: his work as CEO at Agnos.ai, an enterprise knowledge graph and AI consultancy the working group he founded in 2023 which resulted in the DPROD specification to describe data products an overview of the data-product marketplace and the data economy the need to account for the demand side of the data marketplace the intent of his current work on to address the disconnect between tech activities and business use cases how the capabilities of LLMs and knowledge graphs complement each other the origins of his "use-case tree" model in a huge banking enterprise knowledge graph he built ten years ago how use case trees improve LLM-driven multi-agent architectures some examples of the persona-driven, tech-agnostic solutions in agent architectures that use-case trees support the importance of constraining LLM action with a control layer that governs agent activities, accounting for security, data sourcing, and issues like data lineage and provenance the new Use Case Tree Work Group he is forming the paradox in the semantic technology industry now of a lack of standards in a field with its roots in W3C standards Jacobus' bio Jacobus Geluk is a Dutch Semantic Technology Architect and CEO of agnos.ai, a UK-based consulting firm with a global team of experts specializing in GraphAI — the combination of Enterprise Knowledge Graphs (EKG) with Generative AI (GenAI). Jacobus has over 20 years of experience in data management and semantic technologies, previously serving as a Senior Data Architect at Bloomberg and Fellow Architect at BNY Mellon, where he led the first large-scale production EKG in the financial industry. As a founding member and current co-chair of the Enterprise Knowledge Graph Forum (EKGF), Jacobus initiated the Data Product Workgroup, which developed the Data Product Ontology (DPROD) — a proposed OMG standard for consistent data product management across platforms. Jacobus can claim to have coined the term "Enterprise Knowledge Graph (EKG)" more than 10 years ago, and his work has been instrumental in advancing semantic technologies in financial services and other information-intensive industries. Connect with Jacobus online LinkedIn Agnos.ai Resources mentioned in this podcast DPROD specification Enterprise Knowledge Graph Forum Object Management Group Use Case Tree Method for Business Capabilities DCAT Data Catalog Vocabulary Video Here’s the video version of our conversation: https://youtu.be/J0JXkvizxGo Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 26. In an AI landscape that will soon include huge groups of independent software agents acting on behalf of humans, we'll need solid mechanisms to guide the actions of those agents. Jacobus Geluk looks at this situation from the perspective of the data economy, specifically the data-products marketplace. He helped develop the DPROD specification that describes data products and is now focused on developing use-case trees that describe the business needs that they address. Interview transcript Larry: Okay. Hi everyone. Welcome to episode number 26 of the Knowledge Graph Insights podcast. I am really happy today to welcome to the show, Jacobus Geluk. Sorry, I try to speak Dutch, do my best. His last name means happiness in Dutch, which you'll see a lovely bit of serendipity here. Jacobus is the CEO at Agnos.ai, which is a really prominent enterprise knowledge graph consultancy based in London. So welcome Jacobus, tell the folks a little bit more about what you're doing these days. Jacobus: Oh, thank you very much, Larry, for the opportunity. Well, we are a small, let's say, boutique consulting company, but focusing on the enterprise knowledge graph in combination with gen AI, LLM, of course. I think that's a match made in Heaven, these two technologies, they need each other and we jump on that completely all in because I think that's the future. And yeah, would love to talk about this topic- Larry: Yeah, well there's one topic in particular we've been talking about that I really want to air out today is the notion of both of those, all of AI, it runs on data and there's this emerging thing, the data product that is like, I don't know how clearly articulated it is, but you've worked on the DPROD initiative working on articulating an ontology for what a data product is. So you know as much about it as anybody. So what I'd love to do is just talk about, well first, what is a data product? For folks who aren't familiar or discovering this for the first time, how would you describe a data product? Jacobus: Yeah, well, we started talking about this, I set this up in September 2023, and I started talking to Tony Seale, who's a very famous, he's the The Knowledge Graph Guy. You'll find him on LinkedIn. He writes fantastic articles. And at that time, he worked on the largest knowledge graph project in the financial industry, I think at the moment, which is one of the largest which was at UBS. And I asked him to become the chair of a new work group that they wanted to set up in the context of the Enterprise Knowledge Graph Forum, which is part of the Object Management Group, which is a standards organization like W3C. And to my surprise, almost it went so well that within a year we basically hammered out this standard, or it's now an official OMG standard called DPROD that stands for the data product ontology. Jacobus: And we work with many, many people mostly from banks like JP Morgan and some other people from UBS, Credit Suisse was involved, London Stock Exchange Group, Bloomberg, you name it. Like a whole range of people, Amazon, British Telecom. So there's multiple different types of people working on it: specialists, data architects, et cetera. And the idea was we want to see the world moves towards data marketplaces, like large companies like London Stock Exchange Group for example, or Azure, Google, et cetera. They host data or they sell data. They are data vendors in a sense. So if you sell something, what is your product? Your data is a product or access to that data is a product. So rather than modeling things in terms of data sets, et cetera, which is what we did so far, that is the state-of-the-art data, cataloging, et cetera, using a standard called DCAT that is basically the data catalog standard. Jacobus: So we thought, "Okay, let's elevate that a little bit higher." We don't want to think in terms of pure data sets anymore. That's kind of more like an internal thing. The customer doesn't really care. The customer just wants to know, "What is your product, how can I access it, how can I buy it? What are the terms and conditions? What is the purpose? What are the use cases that we can support," et cetera. So I'm not claiming that we have all of that, but we have at least created a one step up from DCAT, literally an extension to DCAT that basically says, "Okay, now we define what the data product is, what are the inputs and the outputs. Input ports, output ports, basically how does it fit into a larger supply chain of data?" And that is already a step forward and companies are using it. Jacobus: Like there's already several companies that have told us that they are actually using it like JP Morgan and London Stock Exchange Group, UBS, they are all using it. I'm not sure if it's already in production or not, but they worked with us to create this and there's apparently a need to model the world in terms of your data products. Jacobus: But my own story to that is, okay, you have products, but you would say instead of a data marketplace, you could also say it's a data economy. Basically you want to look at the world as a data economy, your own enterprise, but also beyond the enterprise. It's a larger data economy where you have supply and demand. So we have now defined all these data products on the supply side of the data economy, but what is the demand side? That there's no real standard yet, I think, that defines what the use cases are. Jacobus: Like if you want to talk to the business, the business has a problem, has a budget, and they want us to build something every time. So what is that? That is a use case. That's the term that they use. The business is talking about use cases or apps or systems or whatever you call it, but most of the time they use the term use case. So we want to create a new thing, basically defining the stuff on the demand side of that data economy called use cases. And these use cases can cannot only serve the purpose of defining very precisely what the business wants and needs, but also how that links to data products with data contracts in between et cetera. Jacobus: And last but not least, before I stop talking, also the LLM, the gen AI basically needs to know what is the use case I am supposed to operate in. So that's the most exciting angle to this story basically is we want to control the LLM agents, of course, and make them useful, productive, producing high quality output in production, in mission-critical use cases. But what are those use cases? We want those use cases to be data and it's data as code almost. But let's stop talking about this now. Larry: No,...
-
21
Rebecca Schneider: Knowledge Graphs and Enterprise Content Strategy – Episode 25
Rebecca Schneider Skills that Rebecca Schneider learned in library science school - taxonomy, ontology, and semantic modeling - have only become more valuable with the arrival of AI technologies like LLMs and the growing interest in knowledge graphs. Two things have stayed constant across her library and enterprise content strategy work: organizational rigor and the need to always focus on people and their needs. We talked about: her work as Co-Founder and Executive Director at AvenueCX, an enterprise content strategy consultancy her background as a "recovering librarian" and her focus on taxonomies, metadata, and structured content the importance of structured content in LLMs and other AI applications how she balances the capabilities of AI architectures and the needs of the humans that contribute to them the need to disambiguate the terms that describe the span of the semantic spectrum the crucial role of organization in her work and how you don't to have formally studied library science to do it the role of a service mentality in knowledge graph work how she measures the efficiency and other benefits of well-organized information how domain modeling and content modeling work together in her work her tech-agnostic approach to consulting the role of metadata strategy into her work how new AI tools permit easier content tagging and better governance the importance of "knowing your collection," not becoming a true subject matter expert but at least getting familiar with the content you are working with the need to clean up your content and data to build successful AI applications Rebecca's bio Rebecca is co-founder of AvenueCX, an enterprise content strategy consultancy. Her areas of expertise include content strategy, taxonomy development, and structured content. She has guided content strategy in a variety of industries: automotive, semiconductors, telecommunications, retail, and financial services. Connect with Rebecca online LinkedIn email: rschneider at avenuecx dot com Video Here’s the video version of our conversation: https://youtu.be/ex8Z7aXmR0o Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 25. If you've ever visited the reference desk at your local library, you've seen the service mentality that librarians bring to their work. Rebecca Schneider brings that same sensibility to her content and knowledge graph consulting. Like all digital practitioners, her projects now include a lot more AI, but her work remains grounded in the fundamentals she learned studying library science: organizational rigor and a focus on people and their needs. Interview transcript Larry: Hi, everyone. Welcome to episode number 25 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Rebecca Schneider. Rebecca is the co-founder and the executive director at AvenueCX, a consultancy in the Boston area. Welcome, Rebecca. Tell the folks a little bit more about what you're up to these days. Rebecca: Hi, Larry. Thanks for having me on your show. Hello, everyone. My name is Rebecca Schneider. I am a recovering librarian. I was a trained librarian, worked in a library with actual books, but for most of my career, I have been focusing on enterprise content strategy. Furthermore, I typically focus on taxonomies, metadata, structured content, and all of that wonderful world that we live in. Larry: Yeah, and we both come out of that content background and have sort of converged on the knowledge graph background together kind of over the same time period. And it's really interesting, like those skills that you mentioned, the library science skills of taxonomy, metadata, structured, and then the application of that in structured content in the content world, how, as you've got in more and more into knowledge graph stuff, how has that background, I guess... what's been the transition like as you start to consider knowledge graphs in your content work? How's that been going? Rebecca: Well, I mean, librarians, we're all about organizing things, and we like to organize stuff, and this is just sort of the next step in helping organize stuff so people can find things. I mean, that's what we're all about, right? We want to help people find things. So moving into more and more sophisticated mechanisms that help people not only find things, but leverage content in new and different ways and useful ways is, I think, just a logical progression. Larry: Yeah, that's really... that content discovery, like finding things, discovery, and uncovering things, that's sort of always been the information architecture-y part of content practice. But how is that changing? We were talking a little bit before we went on the air about the emergence of LLMs and then the ensuing interest in knowledge graphs associated with that. Are you finding different challenges in helping machines discover stuff as opposed to humans? Rebecca: Well, okay, there's a couple of different aspects to that. One is that, as we know, structured content is so very important for the success of LLMs, AI applications, et cetera. And the thing is, I have to convince my clients to take the time to clean their house, so to speak, and say, "Okay, you need to clean up your data. You need to clean up your content. You need to get rid of the old, outdated, trivial, extraneous stuff so you have a clean basis to start from." And convincing clients to take that time is a bit of a struggle because they see all the fancy LLMs and all different wonderful things people are doing, and they just want to jump in with both feet, which is great, but you need to have a solid basis first. So it takes some convincing to have people take the time to do that. Rebecca: And then on the LLM side, we have to acknowledge that there are many different kinds of LLMs with different pros and cons depending on what the use case is, and you need to not only understand the LLMs, but also the organization's business drivers, what are their goals, what are their objectives, what are they trying to do with this? Because it's not just to write me a poem in the style of e.e. cummings about apples or something. There are definite business drivers and goals because a lot of money is putting a lot of funds into these kinds of technologies, and you got to make sure that you're getting your bang for your buck. Larry: Yeah. As you say that, you're reminding me that everybody's wrestling, you're not wrestling, but figuring out how to do work with RAG architectures and graph RAG and all these hybrid AI architectures. And I love the way that it sounds like your work... you start with the business goals and kind of back up from there, which seems like a good approach. Are you discovering any patterns or approaches as you figure out how do you balance... because that's an interesting triangle of needs, the business part of the client needs. There's sort of content needs, and then the capabilities of the LLMs and the other tooling and these architectures. That's got to be a lot, it sounds like. Rebecca: It is, and you also have to think about how much work am I going to make people do. You can't train everybody to be prompt engineers, right? And so you need to think about from their perspective, what they're trying to get out of whatever application you're creating, and also their interaction with it to the extent to which, yes, the information is in there, but you have to ask better questions. Does that mean I have to retrain people on how to ask better questions? To what extent is that necessary, or are there other ways of leveraging how we use an LLM in this sort of hybrid structure? How can we use ontologies, et cetera, informing the knowledge graph to help the user so they don't have to be prompt engineers, that they can get what they need with the minimum of fuss. Larry: Yeah, that's really interesting. As you say that, using ontologies to inform the knowledge graph, I was just kind of assumed in my head that ontologies were a prerequisite for knowledge graphs, but I had Jessica Talisman on a while back, and she makes the case that you can judge a lot just with a SKOS-based thesaurus. Do you consider that kind of continuum of semantic sophistication from just term lists to taxonomies to thesauri to full-blown ontologies, are you playing across that spectrum in your work? Rebecca: Yeah, absolutely, absolutely. And everybody... not everybody. A lot of people say, well, taxonomy, and they use it to refer to controlled vocabularies, ontologies, et cetera. To them, it's all taxonomy. It's not, actually. It is a spectrum. It is a progression of, okay, I've got my control vocabulary, then I have hierarchical structure, and then I have synonyms and use-for and all of that kind of stuff. And then you have the ontology. So it's definitely to my mind, a spectrum. And sometimes you don't need the full-blown... I agree. You don't need the full-blown ontology. You can do a lot with a well-structured, well-thought-out taxonomy in an SKOS architecture, and you can really leverage that and you might not need to go the full ontology route. Baby steps. Larry: Yeah. Well, I think baby steps, that's a lesson that comes up everywhere. And that's it. And some of those baby steps, so how... I've talked to a lot of people about this from different backgrounds, and this seems to come really naturally to librarians, or people out of a library science background, that comfort with that spectrum of options for how to organize things is that's kind of a... I didn't go to library science school. I've hung out with you all a lot. But is that sort of just a part of the mindset of a librarian? Rebecca: Yeah, I mean, a lot of people actually become librarians as a second career. When I went to school back in the day,...
-
20
Ashleigh Faith: Knowledge Graph Modeling and AI Architectures – Episode 24
Ashleigh Faith With her 15-year history in the knowledge graph industry and her popular YouTube channel, Ashleigh Faith has informed and inspired a generation of graph practitioners and enthusiasts. She's an expert on semantic modeling, knowledge graph construction, and AI architectures and talks about those concepts in ways that resonate both with her colleagues and with newcomers to the field. We talked about: her popular IsA DataThing YouTube channel the crucial role of accurately modeling actual facts in semantic practice and AI architectures her appreciation of the role of knowledge graphs in aligning people in large organizations around concepts and the various words that describe them the importance of staying focused on the business case for knowledge graph work, which has become both more important with the arrival of LLMs and generative AI the emergence of more intuitive "talk to your graph" interfaces some of her checklist items for onboarding aspiring knowledge graph engineers how to decide whether to use a property graph or a knowledge graph, or both her hope that more RDF graph vendors will offer a free tier so that people can more easily experiment with them approaches to AI architecture orchestration the enduring importance of understanding how information retrieval works Ashleigh's bio Ashleigh Faith has her PhD in Advanced Semantics and over 15 years of experience working on graph solutions across the STEM, government, and finance industries. Outside of her day-job, she is the Founder and host of the IsA DataThing YouTube channel and podcast where she tries to demystify the graph space. Connect with Ashleigh online LinkedIn IsA DataThing YouTube channel Video Here’s the video version of our conversation: https://youtu.be/eMqLydDu6oY Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 24. One way to understand the entity resolution capabilities of knowledge graphs is to picture on old-fashioned telephone operator moving plugs around a switchboard to make the right connections. Early in her career, that's one way that Ashleigh Faith saw the power of knowledge graphs. She has since developed sophisticated approaches to knowledge graph construction, semantic modeling, and AI architectures and shares her deeply informed insights on her popular YouTube channel. Interview transcript Larry: Hi, everyone. Welcome to episode number 24 of the Knowledge Graph Insights Podcast. I am super extra delighted today to welcome to the show Ashleigh Faith. Ashleigh is the host of the awesome YouTube channel IsA DataThing, which has thousands of subscribers, thousands of monthly views. I think it's many people's entry point into the knowledge graph world. Welcome, Ashleigh. Great to have you here. Tell the folks a little bit more about what you're up to these days. Ashleigh: Thanks, Larry. I've known you for quite some time. I'm really excited to be here today. What about me? I do a lot of semantic and AI stuff for my day job. But yeah, I think my main passion is also helping others get involved, understand some of the concepts a little bit better for the semantic space and now the neuro-symbolic AI. That's AI and knowledge graphs coming together. That is quite a hot topic right now, so lots and lots of untapped potential in what we can talk about. I do most of that on my channel. Larry: Yeah. I will refer people to your channel because we've got only a half-hour today. It's ridiculous. Ashleigh: Yeah. Larry: We just talked for an hour before we went on the air. It's ridiculous. What I'd really like to focus on today is the first stage in any of this, the first step in any of these knowledge graph implementations or any of this stuff is modeling. I think about it from a designerly perspective. I do a lot of mental model discernment, user research kind of stuff, and then conceptual modeling to agree on things. But when you get into this world, you get much more into the implementation side of going from a conceptual model into logical, physical models as well. Can you just talk a little bit first just about why modeling is so important? Ashleigh: Yeah. Modeling is the way that you put any of your data into context. If you're looking at something as a human and you see a column of names, you can look at that and say, "Oh, I see that as a human name." But you don't know if they're a customer, are they staff, are they vendors. You have no idea unless there's a column header that tells you what that is. That is the most simplistic model, it's a list. Ashleigh: That's why modeling is so important is because if you don't have any context to what this data is, and what it means, and how you should interpret it, it basically means nothing to you. Then you can misinterpret it and that leads to all kinds of problems. Larry: Right. One of the things we talked about before we went on the air was, "What are facts? What is knowledge?" Because ostensibly, that's what we're ensconcing in these systems that are built with these conceptual models that we come up with. How do you discern that stuff? Ashleigh: Yeah. If you look at a lot of us in the semantic space look a lot at Wikidata. If you look at Wiki data, these things are called statements. A triple that defines something and makes a statement about the world. In my mind, I come from a very deep scholarly community background where we're doing a lot of research and you're trying to find corroborating evidence. In my mind, when you turn a statement into a fact, it is when in that point in time, what is the overwhelming acceptance of a certain stance or statement in the world based on corroborating evidence. Now there are competing opinions about things. There are disputes, especially when you're talking about scholarship. Some people, their findings are one thing, and findings are different in a different study. Ashleigh: But all of that comes down to is there enough corroborating evidence in one direction? When there are disputes, sometimes there's nuance that has been added to the original statement where that original statement is still now true, it's still accepted. But now, with this additional evidence, it gets even more nuanced, so you get more specific. You can really understand the nuances of that. Ashleigh: There is a half-life of facts, though. Because there was a point in time in the world where people, an overwhelming amount of people, thought the world was flat. We all know that is not the case. But at the time, that was a fact because there was no other evidence to support it otherwise. Or there was very little, or it wasn't disseminated appropriately because it was quite some time ago. But that's where it all comes down is can you defend? If you're in a court of law, how do you defend your statements? How do you defend your argument? This is how scholarship has done things since the very beginning of time. Give me your evidence, go do some studies, and go and do some experiments to gather evidence to support your claim. That's what a hypothesis is doing, is can you support your hypothesis? Can you disclaim it? That's how you figure out what a statement and a fact is. Ashleigh: Then when you have that, how do you codify it so that it can be even more useful? That's where knowledge graphs really come in because you can create it in a way that you can do inferencing. I know that's a weighted statement nowadays. Unfortunately, it's a weighted statement. Oh, actually, even saying it's a weighted statement is problematic. I should stop while I'm ahead on that one. Larry: Yeah. Okay, we can change it. No. Ashleigh: Yeah. No, no. You can add in though that data governance aspect, especially if you're using an ontology. There's all these other aspects of why taking statements, and facts, and all of that evidence, and adding it into a system of record that you can then do other things with is really, really helpful, especially in this AI world. Larry: Yeah. That notion of system of record, for some people that's enterprise lingo. I love that you're talking about it. I love the academic and intellectual rigor that underlies your approach to modeling. I think a lot of the facts and knowledge that people are ensconcing in systems in the knowledge graph world, there's just a lot of enterprise stuff going on. What are the facts, and evidence, and corroboration that you need? Like the classic thing that comes up in every conversation. When you say "customer," what do you mean? Ashleigh: Yeah. Larry: How do you ensconce that organizational disparity in understanding of that term? Ashleigh: Oh, but that's why I love knowledge graphs for this! You can have your data catalog and you can have your taxonomies. You can attempt to get everyone in your organization to agree on what a specific label means. In my practical experience, all those things are great and lovely. And yes, do those if you can do do those. Often times though, and this is what I learned very early on in my career when I was first doing taxonomy work, was nobody ever agrees on what label is used for certain things. Or very few. There's always, "Well, there's that exception to the rule in our enterprise and it's super critical to our mission," and whatever else it might be. So you have that going on. Ashleigh: That's why I really love the very early examples where knowledge graph was used as a connector. I almost imagine it as those really old-school operators that would connect people with phone lines. You know, sitting there with the giant connector boards. That's what I think about when I think of the early examples that I was using with knowledge graph, which is you have a node and it represents this concept. That concept can be represented by an ID, so it does not need interpretation. It is that is the ID. That is the UID,...
-
19
Panos Alexopoulos: Semantic Modeling for Data – Episode 23
Panos Alexopoulos Any knowledge graph or other semantic artifact must be modeled before it's built. Panos Alexopoulos has been building semantic models since 2006. In 2020, O'Reilly published his book on the subject, "Semantic Modeling for Data." The book covers the craft of semantic data modeling, the pitfalls practitioners are likely to encounter, and the dilemmas they'll need to overcome. We talked about: his work as Head of Ontology at Textkernel and his 18-year history working with symbolic AI and semantic modeling his definition and description of the practice of semantic modeling and its three main characteristics: accuracy, explicitness, and agreement the variety of artifacts that can result from semantic modeling: database schemas, taxonomies, hierarchies, glossaries, thesauri, ontologies, etc. the difference between identifying entities with human understandable descriptions in symbolic AI and numerical encodings in sub-symbolic AI the role of semantic modeling in RAG and other hybrid AI architectures a brief overview of data modeling as a practice how LLMs fit into semantic modeling: as sources of information to populate a knowledge graph, as coding assistants, and in entity and relation extraction other techniques besides NLP and LLMs that he uses in his modeling practice: syntactic patterns, heuristics, regular expressions, etc. the role of semantic modeling and symbolic AI in emerging hybrid AI architectures the importance of defining the notion of "autonomy" as AI agents emerge Panos' bio Panos Alexopoulos has been working since 2006 at the intersection of data, semantics and software, contributing in building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, Panos currently works as a principal educator at OWLTECH, developing and delivering training workshops that provide actionable knowledge and insights for data and AI practitioners. He also works as Head of Ontology at Textkernel BV, in Amsterdam, Netherlands, leading a team of data professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos has published several papers at international conferences, journals and books, and he is a regular speaker in both academic and industry venues. He is also the author of the O’Reilly book “Semantic Modeling for Data – Avoiding Pitfalls and Dilemmas”, a practical and pragmatic field guide for data practitioners that want to learn how semantic data modeling is applied in the real world. Connect with Panos online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/ENothdlfYGA Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 23. In order to build a knowledge graph or any other semantic artifact, you first need to model the concepts you're working with, and that model needs to be accurate, to explicitly represent all of the ideas you're working with, and to capture human agreements about them. Panos Alexopoulos literally wrote the book on semantic modeling for data, covering both the principles of modeling as well as the pragmatic concerns of real-world modelers. Interview transcript Larry: Hi everyone. Welcome to episode number 23 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Panos Alexopoulos. Panos is the head of ontology at Textkernel, a company in Amsterdam that works on knowledge graphs for the HR and recruitment world. Welcome, Panos. Tell the folks a little bit more about what you're doing these days. Panos: Hi Larry. Thank you very much for inviting me to your podcast. I'm really happy to be here. Yeah, so as you said, I'm head of ontology at Textkernel. Actually, I've been working in the field of data semantics, knowledge graph ontologies for almost now 18 years, even before the era of machine learning, back when it was mostly about symbolic AI. Yeah, I've been working a lot on this field. I've seen its ups and downs, I've seen it's good and bad things, and I think our discussion is going to focus on these things. What I've been doing lately now with the field of AI, and I think... No, let me say this differently. I think that the field of data semantics, even in the era of AI and large language models, et cetera, is even more important and this is something that I'm actively looking now. I'm actually looking a lot on the synergy and in the interrelation between large language models with data, with knowledge graphs and ontologies. Larry: Yeah, I'd love to talk more about that because that just seems to be in the air. One thing that I want to talk about, and I realized I totally left out of my intro, that you wrote this brilliant book called Semantic Modeling for Data. Larry: That's right. I kind of buried my lead there, as we say in journalism, but one of the first things that, and we talked about this a little bit before we went on the air. Can you describe to folks what semantic modeling is? What are we doing there when we're modeling? Panos: Yes. So the definition I give to the term of semantic modeling is the practice of building descriptions of data that have three important characteristics. The first thing, the first characteristic is that this description should be accurate, that this we should describe data and domains in a correct way, right? We don't want to have statements and assertions that are wrong. The second characteristic is that these descriptions should be explicit both for people and machines. What does that mean? If I have a data, if I have a data set, a set of data and I give it to you, and when you read it, you cannot understand what it is about. That's not good semantics, right? The meaning is lost, and the same applies for systems, for machines. If I call an API, I take some data back and my machine, enterprise system is not able to interpret the meaning of this data, then I have an issue. Panos: The third characteristic is agreement. It's not enough to have explicit meaning on data. It's also very important that we both agree on the validity of that meaning and that we serve the same meaning, right? And it starts with, I can give as example very simple things like what is a knowledge graph? If you go and you try to find a definition of what a knowledge graph, you will see many definitions that are not necessarily consistent to each other, right? So there's already disagreement there, and actually the word of a good ontology, semantic modeler would be to try to start with defining that. So that's what semantic modelers do and the artifacts that we build, this is practically an umbrella type, semantic modeling that covers a lot of artifacts that we build. These artifacts can range from database schemas, taxonomies, hierarchies, glossaries, thesauri, ontologies as many of our audience already heard, knowledge graph, et cetera. Panos: So when you do semantic modeling, you're not building necessarily one type of artifact. Then the key to remember is that you are describing data, you're describing domains by means of formal symbolic representation. That's also another important thing that semantic data modeling is about creating explicit human understanding, not only system understandable, but also human understandable descriptions of data. That means that embeddings, large language model, et cetera, do not fall into this category because their representations, their underlying representation are subsymbolic, there are numbers, so they maybe contain some meaning. They may be encoding some meaning, but according to my definition, they do not fall into the practice of semantic data modeling. Larry: Got it. The way you just said that sub-symbolic, because I've heard a lot of people talk about the symbolic AI and neural networks or machine learning and other, the sub-symbolic stuff. Is there a distinct dividing line there or is this a continuum going up and down then the AI stuff? Panos: For me, I don't think it's a continuum. Sub-symbolic is when you have numbers practically, right? So when you have a word, and this is encoded by a vector of 1600 or so numbers, this representation is subsymbolic. Symbolic means words, symbolic means human language, right? Terminology that doesn't happen in machine learning. Larry: Got it. Yeah, no, as soon as you said that, I was like, "It's more of a Boolean." Like, "Yep, it's symbolic or it's not." Panos: Of course of you can have, let's say hybrid models. For instance, you can take a knowledge graph of entities and the attributes of each entity. You can have the name of the entity, some other characteristics, and then you may have also an embedding of that entity. So you can have at the same time, two representations, a symbolic one and a subsymbolic one for the same entity, and you use that for different purposes. For example, an embedding is very good to find similarities between entities, which is more difficult to do it in the symbolic space. Larry: Interesting, yeah. And this gets into RAG architectures and other hybrid architectures that are emerging, or does it? Is that how those come together that if you're looking for similarities, you revert to like a subsymbolic system, but if you're looking for, I don't know, knowledge-based stuff, you shift into the symbolic. Panos: Yeah, so RAG is a nice example, right? So what's the idea of RAG? RAG stands for Retrieval Augmented Generation. It's the idea that you have your LLM, but you want it to give you answers based on your own data, based on your own knowledge. So when you give a prompt, when you give a query to the LLM before the LLM answers, you have a step where you retrieve relevant information to your query from some database, from some other set of documents, or as it's lately more into vogue structured data and knowledge graphs....
-
18
Mike Pool: Is it time for a moratorium on the word “semantics”? – Episode 22
Mike Pool Mike Pool sees irony in the fact that semantic-technology practitioners struggle to use the word "semantics" in ways that meaningfully advance conversations about their knowlege-representation work. In a recent LinkedIn post, Mike even proposed a moratorium on the use of the word. We talked about: his multi-decade career in knowledge representation and ontology practice his opinion that we might benefit from a moratorium on the term "semantics" the challenges in pinning down the exact scope of semantic technology how semantic tech permits reusability and enables scalability the balance in semantic practice between 1) ascribing meaning in tech architectures independent of its use in applications and 2) considering end-use cases the importance of staying domain-focused as you do semantic work how to stay pragmatic in your choice of semantic methods how reification of objects is not inherently semantic but does create a framework for discovering meaning how to understand and capture subtle differences in meaning of seemingly clear terms like "merger" or "customer" how LLMs can facilitate capturing meaning Mike's bio Michael Pool works in the Office of the CTO at Bloomberg, where he is working on a tool to create and deploy ontologies across the firm. Previously, he was a principal ontologist on the Amazon Product Knowledge team, and has also worked to deploy semantic technologies/approaches and enterprise knowledge graphs at a number of big banks in New York City. Michael also spent a couple of years on the famous Cyc project and has evaluated knowledge representation technologies for DARPA. He has also worked on tooling to integrate probabilistic and semantic models and oversaw development of an ontology to support a consumer-facing semantic search engine. He lives in New York City and loves to run around in circles in Central Park. Connect with Mike online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/JlJjBWGwSDg Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 22. The word "semantics" is often used imprecisely by semantic-technology practitioners. It can describe a wide array of knowledge-representation practices, from simple glossaries and taxonomies to full-blown enterprise ontologies, any of which may be summarized in a conversation as "semantics." Mike Pool thinks that this dynamic - using a word that lacks precise meaning while assuming that it communicates a lot - may justify a moratorium on the use of the term. Interview transcript Larry: Hi everyone, welcome to episode number 22 of the Knowledge Graph Insights podcast. I'm really happy today to welcome to the show Mike Pool. Mike is a longtime ontologist, a couple of decades plus. He recently took a position at Bloomberg. But he made this really provocative post on LinkedIn lately that I want to flesh out today, and we'll talk more about that throughout the rest of the show. Welcome, Mike, tell the folks a little bit more about what you're up to these days. Mike: Hey, thank you, Larry. Yeah. As you noted, I've just taken a position with Bloomberg and for these many years that you alluded to, I've been very heavily focused on building, doing knowledge representation in general. In the last let's say decade or so I've been particularly focused on using ontologies and knowledge graphs in large banks, or large organizations at least, to help organize disparate data, to make it more accessible, breakdown data silos, et cetera. It's particularly relevant in the finance industry where things can be sliced and diced in so many different ways. I find there's a really important use case in the financial space but in large organizations in general, in my opinion, for using ontology. So that's a lot of what I've been thinking about, to make that more accessible to the organization and to help them build these ontologies and utilize them effectively. Larry: Nice. One of the intellectual I guess foundations of that kind of practice is what we call semantics. Anyhow, I want to read part of that post you made on LinkedIn, which started a great conversation. One of the things you suggested, "I think we need to impose a moratorium on the use of the word semantics. The reason is simple, it's ironically a term lacking any precise meaning while we assume it's communicating a lot." That's brilliant, can you elaborate on that a little bit? Was there a particular, did something inspire that or has it just been on your mind? Mike: Yeah. I mean, it's mostly, one term that often triggers it for me is this term I see within, let's call it this community of practice ... I see used very, very frequently, people will say, "Let's look at the semantic meaning." So this redundancy in terms, that we said, "Well, what in the world?" But we use it for all kinds of things. We say we need a semantic solution, we need the semantic meaning. And very often what that ends up being when we drill into that, it's just not always clear. The term I think in some sense has become either too vague ... it's unclear of what precisely it means. Or it's a shorthand for something else, that we're not actually saying we're going to capture meaning. We're saying, we're going to use this particular set of tools or something like that. So my concern is that it's sort of lost. We know when we say it that we mean we're going to use these particular set of tools, these particular set of languages, but to the people with whom we're communicating that might remain completely unclear. So yeah, that's my concern about the way we're using the concept. Larry: Yeah. That's really interesting that ... it's not laziness, it's like heuristics or something like that, that people use all the time to just try to advance whatever conversation they're in or project or whatever. It's just like, "Oh yeah, we need a semantic thing there," or something like that, it sounds like. Or they're thinking of possibly 20 different things or they just say, "Oh, semantic is the closest word I know to that idea," that we need to advance this. Mike: Yeah. I mean, an example is ... Because I think, as I noted somewhat ironically, I herald myself as a semantic technology practitioner or something like that. After you said to me, "Well, what in the world is semantic technology?" It's a good question. If I create a property graph, there's part of me that says, well, that's not really semantic but a triple store is. It's like, well, what's the dividing line? What precisely makes it count as semantic or not? It's a little bit hard to pin that down. Larry: The way you just said it, it's almost like there's an on/off switch someplace. But I've seen, there's a lot of representations of what various people have called something like the semantic spectrum, from just term lists to glossaries to thesauri, to the ontologies, that kind of thing. It's easy enough to disambiguate between each of those things I just mentioned, but is there something like a spectrum in there? Is that why people are grasping for words, do you think, to describe exactly what they're talking about in the moment? Mike: Yeah. As I said, I think that's part of the problem. As I said, the people with whom I often communicate, I think we more or less mean it as a shorthand for using RDF OWL. And that might be as simple as using SKOS and creating a simple taxonomy with that, or creating a very elaborate OWL ontology. But it's interesting because, let's say, we create a taxonomy in SKOS. Well, is there any reason that if you just had that taxonomy and you didn't bother to put it in SKOS, you just put it in an Excel spreadsheet with appropriate indentations? We'd say, well, how does the SKOS, or how does the RDF magically capture the meaning where the spreadsheet didn't, right? It's a little bit unclear. But I think other people use it differently. I think there's lots of people who would say using a property graph is a semantic solution. We're capturing knowledge, et cetera, in it. So I think it varies a little bit, that's again, part of the point. But I do believe people with whom I communicate, that's the shorthand we're using. It's like, this is either technology that we use to extend RDF OWL, or it's a knowledge graph that encodes that knowledge. But that's often what it means in my space I think. Larry: Also, you mentioned in that post too, I think, when you're talking about, if your intent is to capture meaning, that there are other ways to do that technologically. You talked about just an old-fashioned ERD or a graph schema or a Python script that captures something in some project you're working on. And even I think you also alluded to, or maybe it came up in the discussion, it could be a natural language thing that you say to an LLM that it could discern. But I guess that gets at, what is the amount of meaning you need to capture? Does that make sense? What's the intent behind your attempt to do something semantic in a technical project? Mike: It sounds like a straightforward question, but it's actually a very good one. Because as I said, that if we say, well, we're trying to capture meaning, you could write a Python script that does it, or there's a lot of different ways to do that. I think this whole, at least the background that I have in this, when we're talking about capturing semantics, what we are really concerned with is really trying to say, can we get a computer to reason in the same way that we do? Can we get the computer to respond to natural language prompts in a similar way that a human can? That's kind of what we meant by semantics. But then if we try to talk about that in the technology space, what exactly does that mean? Let's take the Python script example. If I said, oh, I'm trying to solve this problem, I have this search engine and every time people, they're searching for recipes,...
-
17
Margaret Warren: Image Metadata for Knowledge Graphs and People – Episode 21
Margaret Warren As a 10-year-old photographer, Margaret Warren would jot down on the back of each printed photo metadata about who took the picture, who was in it, and where it was taken. Her interest in image metadata continued into her adult life, culminating the creation of ImageSnippets, a service that lets anyone add linked open data descriptions to their images. We talked about: her work to make images more discoverable with metadata connected via a knowledge graph how her early childhood history as a metadata strategist, her background in computing technology, and her personal interest in art and a photography shows up in her product, ImageSnippets her takes on the basics of metadata strategy and practice the many types of metadata: descriptive, administrative, technical, etc. the role of metadata in the new AI world some of the good and bad reasons that social media platforms might remove metadata from images privacy implications of metadata in social media the linked data principles that she applies in ImageSnippets and how they're managed in the product's workflow her wish that CMSs and social media platforms would not strip the metadata from images as they ingest them the lightweight image ontology that underlies her ImageSnippets product her prediction that the importance of metadata that supports provenance, demonstrates originality, and sets context will continue to grow in the future Margaret's bio Margaret Warren is a technologist, researcher and artist/content creator. She is the founder and CEO of Metadata Authoring Systems whose mission is to make the most obscure images on the web findable, and easily accessible by describing and preserving them in the most precise ways possible. To assist with this mission, she is the creator of a system called, ImageSnippets which can be used by anyone to build linked data descriptions of images into graphs. She is also a research associate with the Florida Institute of Human and Machine Cognition, one of the primary organizers of a group called The Dataworthy Collective and is a member of the IPTC (International Press and Telecommunications Council) photo-metadata working group and the Research Data Alliance charter on Collections as Data. As a researcher, Margaret's primary focus is at the intersection of semantics, metadata, knowledge representation and information science particularly around visual content, search and findability. She is deeply interested in how people describe what they experience visually and how to capture and formalize this knowledge into machine readable structures. She creates tools and processes for humans but augmented by machine intelligence. Many of these tools are useful for unifying the many types of metadata and descriptions of images - including the very important context element - into ontology infused knowledge graphs. Her tools can be used for tasks as advanced as complex domain modeling but can also facilitate image content to be shared and published while staying linked to it's metadata across workflows. Learn more and connect with Margaret online LinkedIn Patreon Bluesky Substack ImageSnippets Metadata Authoring Systems personal and art site IPTC links IPTC Photo Metadata Software that supports IPTC Photo Metadata Get IPTC Photo Metadata Browser extensions for IPTC Photo Metadata Resource not mentioned in podcast (but very useful for examining structured metadata in web pages) OpenLink Structured Data Sniffer (OSDS) Video Here’s the video version of our conversation: https://youtu.be/pjoAAq5zuRk Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 21. Nowadays, we are all immersed in a deluge of information and media, especially images. The real value of these images is captured in the metadata about them. Without information about the history of an image, its technical details, and the context in which it originated, it might as well be a piece of random clip art. Margaret Warren is the creator of ImageSnippets, a tool that helps you create and manage the metadata that imbues your images with meaning. Interview transcript Larry: Hi, everyone. Welcome to episode number 21 of the Knowledge Graph Insights podcast. I am really happy today to welcome to the show my friend, Margaret Warren. Margaret and I helped co-organize a weekly event called The Dataworthy Collective, which is a collaborative learning environment, but she's better known as the CEO and the CTO of a company called Metadata Authoring Systems, which is best known for the product, ImageSnippets, which does metadata management for photos and other kinds of images. Welcome, Margaret. Tell the folks a little bit more about what you're up to these days. Margaret: Hello! How are you doing? Yeah, it's a good introduction, thanks. Well, what I'd like to say about Metadata Authoring Systems is just that we want the most obscure images to be findable and easily accessible by describing and preserving them in the most precise ways possible. That's kind of what we try to do. One of the tools that we created is called ImageSnippets. It's been around for quite some time. ImageSnippets is, I think it can best be described as a way to take image description data itself and parse that into knowledge graphs, which is not something that people necessarily think of that much. It adds a whole new dimension to that content. You have an image, and what is the metadata that's attached to that image? How do you turn that metadata into a graph? Then, what can you do with that graph once you have it created? Larry: Yeah, no, that's super interesting too. A lot of people would back up from that. Like the use case, the end use case, it would be like, "Well, I want that picture of my daughter's 4th birthday party," or this specific product piece, or that famous Porsche engine manifold, the thing that you've example- Margaret: That I show that a lot. Larry: Exactly, yeah. Well, it sort of gets in, but I guess, I definitely want to focus on the use cases, but I think there's a few people, and even people who are versed in metadata from one perspective or another, may benefit from a bigger picture view of how you see the world of metadata. You come at it from image management in particular, but you're really a metadata strategist and management person at heart. Can you talk a little bit about what it is and why it's important? Margaret: Yeah. Well, I'll lead into this by talking a little bit about my background, because it helps to set the stage. I grew up with a photographer, my father was a photojournalist. In the 70s, he was working in a newspaper and I was growing up with this, and I was sort of the metadata person in the family without really understanding that exactly what it would lead to. I would go out and make photos. We were using Nikon film cameras then and everything, and we would go back to the dark room and develop the prints and everything. I would take these, I have a classic example that I used visually, but it's like I have the metadata written on the back of the photo. It's really cute. I was 10 years old, and it's like, Who's in the car? Where we were? Who took the picture? Who developed the picture? Who printed the picture? I even got into the administrative metadata. All this seemed important to me. Margaret: My mother used to write a little bit onto the slides and stuff a little bit, but somehow I just had this knack and I was very interested in this. Now, that continued on over the years because I've always also been an artist and a photographer and a technologist. I was in the Coast Guard and I was an electronics technician, and then I started working on computing systems and doing programming, and ultimately, had my own company doing hardware and networking support and building networks and deploying software solutions, lots of things. Then, I was trying to publish my images on the web and it's super interesting to me that the problem that I was trying to solve about publishing my images on the web almost 30 years ago is still, well, actually, I created a solution that solves it with ImageSnippets, but it's still something that is not really mainstream. Margaret: The entire vision that had to do with metadata and everything was sort of hijacked by the centralized companies that were just so data hungry and so ... well, engagement hungry rather like we want you to engage constantly, so we're going to make it increasingly easy for you to share your images as fast as you possibly can without thinking very carefully about the metadata aspect of what is in these images, why you're sharing it, why it's there. This is such a deep, deep, deep subject. Margaret: To answer your question about what is metadata, I often say that metadata really ultimately is data about anything that you're talking ... I can hold up this coffee cup and say, "Oh, I can describe the coffee cup. It's got flowers on it, the size of it, the shape of it, how warm it is, the temperature," anything that I really say about anything is metadata. But when it comes to, say, images that you're going to put on the web, and I like to narrow it down to images because that's really my forte, still images, but you can have, of course, metadata on PDF files and audio files and video files and everything can have metadata attached to it. Margaret: Inside of images, you can add headers and other types of information that would contain things like the EXIF data. A lot of people are very, very familiar with EXIF data, which is, that is the data that usually comes from hardware. That will come from a camera or it will come from a scanner or some piece of hardware that will say what the settings were. Sometimes it will give the GPS data, if you take an image with your smartphone, and it will include the GPS data unless you turn
-
16
Jans Aasman: Knowledge Graphs in Modern Hybrid AI Architectures – Episode 20
Jans Aasman Hybrid AI architectures get more complex every day. For Jans Aasman, large language models and generative AI are just the newest additions to his toolkit. Jans has been building advanced hybrid AI systems for more than 15 years, using knowledge graphs, symbolic logic, and machine learning - and now LLMs and gen AI - to build advanced AI systems for Fortune 500 companies. We talked about: his knowledge graph and neuro-symbolic work as the CEO of Franz the crucial role of a visionary knowledge graph champion in KG adoption in enterprises the two types of KG champions he has encountered: the magic-seeking, forward-looking technologist and the more pragmatic IT leader trying to better organize their operation the AI architectural patterns and themes he has seen emerge over the past 25 years: logic, reasoning, event-based KGs, machine learning, and of course gen AI and LLMs how gen AI lets him do things he couldn't have imagined five years ago the enduring importance of enterprise taxonomies, especially in RAG architectures which business entities need to be understood to answer complex business questions his approach to neuro-symbolic AI, seeing it as a "fluid interplay between a knowledge graph, symbolic logic, machine learning, and generative AI" the power of "magic predicates" a common combination of AI technologies and human interactions that can improve medical diagnosis and care decisions his strong belief in keeping humans in the loop in AI systems his observation that technology and business leaders seeing the need for "a symbolic approach next to generative AI" his take on the development of reasoning capabilities of LLMs how the code-generation capabilities of LLMs are more beneficial to senior programmers and may even impede the work of less experiences coders Jans' bio Jans Aasman is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of Knowledge Graph Solutions based on AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in-hand with numerous Fortune 500 organizations as well as government entities worldwide. Connect with Jans online LinkedIn email: ja at franz dot com Video Here’s the video version of our conversation: https://www.youtube.com/watch?v=SZBZxC8S1Uk Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 20. The mix of technologies in hybrid artificial intelligence systems just keeps getting more interesting. This might seem like a new phenomenon, but long before our LinkedIn feeds were clogged with posts about retrieval augmented generation and neuro-symbolic architectures, Jans Aasman was building AI systems that combined knowledge graphs, symbolic logic, and machine learning. Large language models and generative AI are just the newest technologies in his AI toolkit. Interview transcript Larry: Hi, everyone. Welcome to episode number 20 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Jans Aasmann. Jans is, he originally started out as a psychologist and he got into cognitive science. For the past 20 years, he's run a company called Franz, where he's the CEO doing neuro-symbolic AI, so welcome, Jans. Tell the folks a little bit more about what you're doing these days. Jans: We help companies build knowledge graphs, but with the special angle that we now offer neuro-symbolic AI so that we, in a very fluid way, mix traditional symbolic logic and the traditional machine learning with the new generative AI. We do this in every possible combination that you could think of. Larry: Who? Jans: These applications might be in healthcare or in call centers or in publishing. It's many, many, many different domains it supplies. Larry: Is it mostly large enterprises or is there a certain scale at which this stuff works better? Jans: Our customers are always Fortune 500, Fortune 100 companies. It's all the companies that are trying to do innovation. Most big enterprises now believe that knowledge graphs is in their future. They're experimenting with it. They do experiments and we help them build their first knowledge graphs in most cases. Once they get that going, they can do it on their own. Larry: Interesting. It's often their first knowledge graph. Where in the organizations is this typically pioneered? As you come into an organization, is it IT or is it data science? Where are you typically entering the organization? Jans: That's actually the wrong question. Larry: Okay. Jans: The thing is, all the places where we're successful, there's a champion, a person that's really looking into the future and has this vision that it's possible to not build a new silo for every new problem, but there should be a way to integrate all the knowledge in the organization into something incredibly useful with that. You can't leave it to a single programmer or a single architect. It's usually someone with some business experience and also some architectural role that believes in this approach. If you leave it up to an IT department, it's just not another database, but it's not the philosophy of a knowledge graph of integrating knowledge. It's just, okay, this is a problem I can solve with the graph. Let's do it that way. Jans: You need a person that says, "Hey, I've got so many different sources of information in my organization. I know we're not combining it in the right way, and it's too complex to put in a relational database. I know we have to solve it with a thing that they now call knowledge graphs, but it's even more than that. I know partly it's data science. It's machine learning. Partly it's rule-based. It's the logic, symbolic logic. I also know that I need the new generative AI in this, but how do I do this all? This is incredibly complex. I can see the future, but how do I do it?" That's where we come in, help build them knowledge graphs, but with a symbolic angle. Larry: I love that, and that knowledge graphs are a key element in the architecture of the future and of the present, it sounds like. That champion who comes to you, it sounds like they're somebody who's been aware of the hazards and consequences of siloed information and data. Is that typically what they're coming in for, of how can we better integrate and understand all of this? Jans: Again, I would say there's two types of champions. One of them is just, they want magic. They see all the articles about generative AI. They see the Gartner articles, Forrester articles about knowledge graphs, and they think, "I have the feeling that something can be done with knowledge graphs and neuro-symbolic AI." Those people are not very super technical. They usually have a technical background but then went into business, but they know something can be done. Jans: Then you have the second type of champion, of people that are literally always over 35, that have spent their active life building application after application. Every time when they created a beautiful application where their bosses were really happy about, the sad thing is, they build a new silo and they made their whole enterprise even more complicated. These are the people that say, "You know what? There has to be a better way of integrating my knowledge." Those are the people that get interested in semantic technology and they say, "There has to be a way that we don't build new silos every time." It's this thing that we call data-centric computing. Data comes first and applications need to go on top, but we don't need to change the data all the time, rewrite and copy the data all the time. That's the disappointed IT person that says, "There has to be a better way." Does it make sense? Larry: Yes, that makes sense. Jans: One is really looking at the future like, "Wow, my company needs magic to make more revenues. The other one is, "Hey, we need to reorganize our IT house, because this is madness the way we do it now." Larry: Yeah. They don't want to do that, repeating the same mistake over and over again. Jans: No, let me try. How do I turn my sound off? Okay. Yeah. Larry: Cool. Yes. You said they're always older, because these folks have been around the block a few times who come in with that. I'm going to guess. Are they generally? It sounds like both. It sounds like maybe in the first type, the magic seekers, you're probably doing a lot of education, but with the second type, you're maybe just more finessing the implementation. Is that? Jans: No, it's a huge difference. The magic seeker will give you freedom to think together, "Okay, how are we going to do this? What is the first baby step we can do to show the rest of the company that this works?" You try to find the low-hanging fruit where you can show how neuro-symbolic AI and knowledge graphs can help and do things they couldn't do before, whereas the second type of champion, the disappointed IT person that says, "We have to find a new way," then it's way more IT-oriented. Let's find three databases where we want to do something extra with. Let's build a semantic data catalog. Let's build an ontology of the objects that we really care about in our business. Then let's see how we can replace an existing system by something that is 10 times more simple, 10 times more easy to understand, and 10 times easier to do data science with. Does that make sense? That second champion is very more IT-oriented and wants to make the flow within the company better, and the first one just wants more revenue by magic. Larry: Right. Jans: Huge difference. Huge difference in the two champions. Larry: Is it pretty much an even mix between those, or do you seek out one or the other more? Jans: No,...
-
15
Juan Sequeda: LLMs as a Critical Enabler for Knowledge Graph Adoption – Episode 19
Juan Sequeda Knowledge graph technology has been around for decades. The benefits so far accruing to only a few big enterprises and tech companies. Juan Sequeda sees large language models as a critical enabler for the broader adoption of KGs. With their capacity to accelerate the acquisition and use of valuable business knowledge, LLMs offer a path to a better return on your enterprise's investment in semantics. We talked about: his work data.world as Principal scientist and the head of the AI lab at data.world the new discovery and knowledge-acquisition capabilities that LLMs give knowledge engineers a variety of business benefits that unfold from these new capabilities the payoff of investing in semantics and knowledge: "one plus one is greater than two" how semantic understanding and the move from a data-first world to a knowledge-first world helps businesses make better decisions and become more efficient the pendulum swings in the history of the development of AI and knowledge systems his research with Dean Allemang on how knowledge graphs can help LLMs improve the accuracy of answers of questions posed to enterprise relational databases the role of industry benchmarks in understanding the return on your invest in semantics the importance of treating semantics as a first-class citizen how business leaders can recognize and take advantage of the semantics and knowledge work that is already happening in their organizations Juan's bio Juan Sequeda is the Principal Scientist and Head of the AI Lab at data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan’s research and industry work has been on the intersection of data and AI, with the goal to reliably create knowledge from inscrutable data, specifically designing and building Knowledge Graph for enterprise data and metadata management. Juan is the co-author of the book “Designing and Building Enterprise Knowledge Graph” and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast. Connect with Juan online LinkedIn Catalog & Cocktails podcast Video Here’s the video version of our conversation: https://youtu.be/xZq12K7GvB8 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 19. The AI pendulum has been swinging back and forth for many decades. Juan Sequeda argues that we're now at a point in the advancement of AI technology where businesses can fully reap its long-promised benefits. The key is a semantic understanding of your business, captured in a knowledge graph. Juan sees large language models as a critical enabler of this capability, in particular the ability of LLMs to accelerate the acquisition and use of valuable business knowledge. Interview transcript Larry: Hi, everyone. Welcome to episode number 19 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Juan Sequeda. Juan is the principal scientist and the head of the AI lab at data.world. He's also the co-host of the really good popular podcast, Catalog & Cocktails. So welcome, Juan. Tell the folks a little bit more about what you're up to these days. Juan: Hey, very great. Thank you so much for having me. Great to chat with you. So what am I up to now these days? Obviously, knowledge graphs is something that is my entire life of what I've been doing. This was before it was called knowledge graphs. I would say that the last year, year-and-a-half, almost two years now, I would say, is been understanding the relationship between knowledge graphs and LLMs. If people have been following our work, what we've been doing a lot has been on understanding how to use knowledge graphs to increase the accuracy for your chat with your data system, so be able to do question answering over your structured SQL databases and how knowledge graphs increase the accuracy of that. So we can chat about that. Juan: But what I'm interested next now is I'm calling this how do we extract knowledge from people's heads? And this isn't new, this is not new, this is back from the '80s and early '90s, all the work on knowledge acquisition. People wanted to create rule-based systems and expert systems, and you need to have knowledge engineers, who go talk to the domain experts and be able to extract their knowledge and codify that. Well, that was hard, and it was brittle, and it was not scalable. And now we have a very fantastic tool called LLMs that I'm very excited to use them to be able to find ways to scale that back. So I think what's old is new, and I think the opportunities to use these new tools to do the work that we've wanted to go do, but we haven't been able to go do, which is just to improve knowledge management. Juan: And one of the reasons why I argue that, we still try to solve the same problems. We're solving the same problems today that we had 30 years ago, but the core problems continue to be there, and I think it's this lack of knowledge. Now, we have a way to scale this out so people can stop complaining, saying, "Oh, it's just too hard, too expensive." That's my long-winded answer of what I'm up to. Larry: Yeah, well, I'd love to jump right into that, because there's all these intersecting things, at least in my world. I run in a lot of different worlds, the content world and knowledge management folks. Anyhow, like you said, in the AI world, there's this longstanding, the expert systems and all that kind of stuff, and then there's the knowledge engineers and enterprise knowledge management stuff. It's always been really hard work, but parts of that are now more amenable to, if not automation, at least acceleration. Can you talk a little bit about how LLMs have kind of advanced the stuff you've always been doing and always wanted to do more of? Juan: Well, so LLMs, first of all, we look at the LLMs, and I like to look at things like as inputs and outputs. So the input is a prompt, and the output is going to be some text around this prompt. That's at a super high level. But what's been fantastic is that if you give it some context as an input, and part of that context, you also give it a task of what you want it to go do, the output that comes out actually aligns very nicely to it. Now, yes, there's hallucinations and stuff, but look, if it gets 80%, 90% done, that's better than nothing. It's better than actually doing things completely manual. And guess what, if a human does it, they are not always getting things 100% correct anyways. So that's at a super high level. Juan: So the example I give all the time is organizations always complain, "Oh, how many customers do we have? Oh, we have so many different customers. It's so complicated." Okay, so what? You're just going to cross your hands and just live like that? No. How about you figure out at least how many different ones are there, and what are the implications of those things? But how would you do that? Well, you need to go start talking to people, and then somebody needs to go talk to all these people and start cataloging and combine all this stuff. Well, what if I actually just posed that question to somebody, saying, "Can you describe what is a customer for you in a sentence or in 10 seconds, in 15 seconds, right? Give me a minute rant of what is a customer for you?" Just take that, and you got a transcript, you got text right there. Juan: So now, I could prompt the LLM to say, "Hey, here's this definition of a customer. Can you please give some definitions?" Here's where the prompting, this is where the interesting experiments are going to come in. And then you can now do that for so many different other people. And then you can do further analysis to figure out, saying, "Hey, where are the overlaps of this stuff? Actually, you know what? Can you codify this? Can you turn this into code? Can you turn this into already some sort of a knowledge graph on it?" And I can now start comparing these things and realize, "Oh, look. There's really X different definitions of this, where the alignment is on 80%, and the 20% of this." Now we know these things. Now I can actually go back. That's the type of stuff that we can actually go do very quickly in the initial experiments that I've been doing. It's like, this stuff is working so freaking well. Juan: I wish we had this back 20, 30 years ago, but now it's time to go look at the problems that we're trying to go solve there. But back then and now we can actually solve them with these new tools. So that's what I'm really excited about and this is something that I really hope, one, I want to show that this can be done, but second, but most importantly is the so what? Once we've done this, what is the value for an organization? So I think those are the ties that we need to go do still. It's like we all complain about we don't know the definition of the customer. So what is the problem for that? What's the money that we're leaving on the table? How much money are we losing by not having a complete definition of a customer? Now that we do have that, we need to be able to say, well, I do have that now. How is that improving? I think that's really important for the next step. Larry: So we're a couple of years into this now, where is the value emerging? You've talked a lot about the workflow in making knowledge graphs and capturing knowledge, but how does that percolate up to business value? Where are folks finding that payoff? Juan: You mean from knowledge graphs in general or...? Larry: Well, just from this whole scheme you're describing, the ability to accelerate understanding of knowledge and what you just described is what I do. I've been doing content modeling for many years and it's sort of very similar kind of work and a lot of that back and forth. I've already seen it accelerated in my content modeling practice as well. And so the way you're talking about,...
-
14
Jesús Barrasa: Pragmatic Advice for Graph Technology Adoption – Episode 18
Jesús Barrasa Over his 20-year career, Jesús Barrasa has spanned the worlds of object-oriented property graphs and assertion-based knowledge graphs. He knows as much about these two foundational technologies as anyone and offers pragmatic advice to help architects and engineers decide which approach will work best for their needs. We talked about: his role at Neo4j in which he helps companies adopt graph technology his academic study of semantic technology and his early work on mapping relational data to ontologies and enterprise ontologies his move across the graph spectrum from RDF graphs to property graphs, culminating in his current role at Neo4j his take on the similarities and differences between RDF and property graph approaches, the key commonality being linked data and the key distinction between them being the level of abstraction that they employ different ways to approach inference the origins of the semantic web and how it has made data actionable and interoperable and smarter how the professional backgrounds of software developers can affect their choice of graph technologies the crucial role of interoperability in graph technology, and our ongoing inability to productively harness it how semantics is managed and used in the property graph and RDF worlds ontology as a technology-independent way of representing knowledge the importance of staying focused on the needs of practitioners when advising them on how to make a graph technology choice how knowledge graphs can balance the "opaque power" of large language models with "explicit, declarative, explainable power" Jesús' bio Dr. Jesús Barrasa is Neo4j's AI Field CTO and the company's resident expert in Knowledge Graphs and Semantic Technologies. He co-authored the O'Reilly book "Building Knowledge Graphs: A Practitioner's Guide" (released in July 2023) and combines over 20 years of professional experience in the data management space split between industry and research and academia. Prior to joining Neo4j, Jesús worked for data integration companies like Denodo and Ontology Systems (now EXFO), where he gained first-hand experience with many successful enterprise-wide data integration deployments and large graph technology projects enhancing the operations and analytics of major companies worldwide. Jesús' doctoral work in Artificial Intelligence and Knowledge Representation focused on the automatic repurposing of legacy data as knowledge graphs. He's an active thought leader in the graph and semantics communities and co-hosts the popular monthly webcast on knowledge graphs "Going Meta." Connect with Jesús online LinkedIn Going Meta webcast. Video Here’s the video version of our conversation: https://youtu.be/7WFP_oDQsxI Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 18. If you search for the term "knowledge graph," you're likely to get an equal number of results about property graphs and RDF-based graphs. Jesús Barrasa has been immersed in both of those technologies for more than 20 years. He takes a pragmatic approach to graph technology adoption, focusing on the needs of practitioners and on the ability of knowledge graphs to balance the "opaque power" of large language models with the explainable power of knowledge graphs. Interview transcript Larry: Hi everyone. Welcome to episode number 18 of the Knowledge Graph Insights Podcast. I am really delighted to welcome to the show Jesús Barrasa. Jesús is about as graph-ey a person as it gets. I got to say he has a 20-year background in this stuff. He's currently the AI field CTO, Chief Technical Officer for the graph database company Neo4j. Welcome to the show, Jesús. Tell the folks a little bit more about what you're doing these days. Jesús: Hi Larry. Thank you very much. I'm really, really glad to be here. I mean, we've been trying to plan this for a while now. Really, really happy to have this conversation today. So yeah, you're right. I'm with Neo4j. And well, I've always been part of the... We call the field organization so we were not part of building our product, but helping our customers adopting it and adopting graph technology in the general case. These days we have a very, very strong focus on as it's inevitable in how we integrate with large language models, GenAI, which we'll talk a little bit about later on in the show but that's what we do. So basically I work all over the globe with all of our customers, many of our customers of course, and helping them adopting graph technology. So that's where I am today. Larry: That's great. This show, I really like to focus on adoption and use and practice around graph technology and you know as much about it as anyone. I don't always start these episodes with a biography, but your background is so interesting. Because I think virtually every guest I've had to this point comes out of the RDF-based, ontologically driven knowledge graph world. And you come out of that world originally, but you're currently in the property graph space. So tell me a little bit about your pathway from your original study and your career. Jesús: Yeah, absolutely. And that's going to age me I guess. But yeah, you're right. Well, I started many others as a university graduate. I went on and I did computer science and started doing software engineering. But what it really starts for our conversation today is when I decided to join the, it was called the Ontology Engineering Group in Madrid and started my PhD. And that's when I met these people that were using these super interesting technology that was called Semantics, the Semantic Web. Of course it was back in the day when Berners-Lee and company published the famous article in Scientific American. Jesús: And that was amazing. I really, really enjoyed it and finished my PhD and this was the early 2000s, so around 2008 or something like that. And what I did basically was coming up with a way of mapping relational data. I don't want to go too nerdy too early in the conversation, but mapping relational data. Which is basically where all the data lived to ontologies. Basically overlaying some form of semantic description of the meaning of the data. And like many others we try to come up with a declarative, structured mapping language to be able to basically leverage in the semantic web the content from relational databases. So that was great fun. And that's work that then later on people like Juan Sequeda, who we talked about earlier today followed and even built a company around it. But that's what I did. Jesús: And then from there on, I moved to the UK to London to join a company called Ontology that's still around under a different name that we're using the RDF stack. And they were focusing on the telecoms verticals. So we were building connected representations of, I would say all the elements involving the delivery of a telecom service. I mean from the physical infrastructure to the logical elements. And the services, the products, even the customers. And we were trying to solve problems like impact analysis, root cause analysis. So we did that for a few years and that was again, great fun. Jesús: And that's where I jumped to other side of the graph, let's call it, camp. Or let's call it spectrum, because more kind of a gradual thing. And a short stint at a data integration company called Denodo. Then I joined Neo4j. And I've been with Neo for the last nine years, so quite a while. Which means that I've been doing graphs for close to 20 years which is crazy. And like you say, I'm one of these unusual individuals out there that has spent as much time doing RDF and SPARQL as I have been using a property graph and Cypher, which is great. That puts me in a great place for conversations like this one. And it's been great fun. So that's kind of a bit of an overview of what I've been doing over the last over 20 years. Larry: And that's really... Because the way you just said that you're reminding me of, it's just data. And you're getting at it in different ways. You're querying it with SPARQL in some cases in Cypher in another context. But as much about the distinctions or how to navigate that spectrum from property graphs to RDF-based knowledge graphs. How would you describe the similarities? If you do a Google search for knowledge graph, it's a mashup of property graph and RDF-based graph stuff. How would you distinguish the two? And what unites them and what separates them, I guess? Jesús: Sure, yeah, I know, absolutely. And you're totally right. So I don't think we always do a great job at helping practitioner, which is ultimately what should be our main objective. So yeah, I mean the way I see it is they share the most important aspect, which is they have the same underlying abstraction, which is connected data. So we think of the world not in terms of tables or documents. We think of the world in terms of things connected to other things. Because that's how humans think. So we think of Larry is a person, Jesús is a person and we're friends and we're connected through a friendship relationship and we work for companies. And the world is a collection of things related to things. Jesús: Now, if you go down the RDF stack approach you're going to break down that into individual atomic statements that are called triples, which is what RDF is based on. And you would put it in your choice of model would typically, not always it shouldn't be and we can happy to talk about that later, determine kind of a technology stack. But ultimately it's kind of a level of abstraction. If you go the RDF route, you would describe it in terms of statements. So person 123 name Larry, person 123 lives in Amsterdam. Person 123 is connected to person 234. Person 234 is Jesús. So everything can be broken down into logical statement into triples. And that's one way, that's one approach. ...
-
13
Yaakov Belch: Humans in the Loop? No. Humans in Control – Episode 17
Yaakov Belch Yaakov Belch is an AI researcher with strong ideas about the role of humans in AI systems. Instead of "human in the loop," he argues, we should put "humans in control." Yaakov's research looks at business contracts and how knowledge graphs and AI systems can both capture their meaning more accurately and help managers make better business decisions. We talked about: his assertion that we need humans in control, not just in the loop his research on applying AI technology to business contracts, in particular the issue of resolving inconsistencies in language model results reasons to put human concerns ahead of any particular technology the importance of having humans in control when interpreting ambiguous business decisions the importance of both accounting for business intent and asking the right questions of your data and how the loop between the two tightens over time the responsibility of human users to understand how LLMs work and to prompt and otherwise interact with them accordingly why he doesn't use the term "hallucination" when talking about LLM outputs the role and implications of applying different kinds of logic in the use of knowledge graphs an analogy that shows how the concept of a Git fork can help knowledge graph engineers account in their models for different versions of reality the real-world applications of his research, especially how the practices he is exploring can create new business value the importance of building any model off of real data and always thinking about which human needs to be in control Yaakov's bio As a mathematician and data scientist, Yaakov Belch brings a unique perspective to the world of AI and knowledge graphs. With a strong background in mathematics, including participation in prestigious International Mathematical Olympiads, Yaakov went on to earn a Ph.D. in pure mathematics from the University of Cambridge. Yaakov's career has spanned both research and industry roles. He has worked as an Algorithm Programmer, collaborating with researchers in bioinformatics and economics, co-authoring academic papers. Yaakov also served as a Senior Data Scientist at Israeli e-commerce startups, where he tackled challenges in symbolic and semantic search from different angles. Currently, Yaakov is on a sabbatical, working as an independent Data Scientist to develop his method of reliable business reasoning, precise contract understanding, and humans-in-control.ai. He sees an interesting connection between the problems from the International Mathematics Olympiads and taming the inconsistencies of large language models: "At one hand, the problems are hard and just don't open up to the known, standard methods of the field. But you know that there is a beautiful solution. You need to find the right perspective to appreciate the problem and to see that beautiful solution." Connect with Yaakov online humans-in-control.ai LinkedIn Video Here’s the video version of our conversation: https://youtu.be/ZS9r0bfkGQc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 17. Machine learning architects often talk about the "human in the loop." Yaakov Belch thinks that when it comes to language models the right approach is to put "humans in control." Yaakov's research looks at how knowledge graphs and large language models can help put humans in control of business contracts, capturing the actual intent that underlies them and facilitating better business decision-making based on the discoveries that they enable. Interview transcript Larry: Hi everyone. Welcome to episode number 17 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Yaakov Belch. Yaakov is an independent senior data scientist and he’s made this really provocative statement about … There’s all this talk in the AI world in general about humans in the loop. And Yaakov says, “No. We need humans in control.” So Yaakov, I would love to talk about that today. Yaakov: We don’t need humans in the loop. We need humans in control. This is a paradigm which crystallized over time in the research which I’m doing, about how to apply language models for contract understanding in the business context. Let’s unpack that and see where the human in control comes in. A business contract is a memorization, a written-down expression of agreement between two or more parties; with the intention to fulfil that agreement in the future. It contains provisions for what the parties promise to do, the rights and obligations they have. And also descriptions what you do when things go wrong. After you make an agreement, you actually want to execute it, you want to fulfill it. Yaakov: When you make a large contract, some people negotiate the contract and other people will be doing the work. There are situations where the people who actually do the work are not really aware about what exactly has been promised in the contract. The huge contract is written in a way as contracts need to be written. But it can’t be understood on the spot when the person needs to decide: Is this promised or not? There can be a disconnect between the expectations of the customer and the provider based on just not being aware of what’s in the contract. In a more advanced setting, you may want to understand the risks which are in the contract in hypothetical situations. When you draft a contract you want to make sure that you’re not setting up yourself for problems when things go in a specific way. Yet more advanced: When you purchase a property, you may acquire contracts which go with them, like liens, leases and bills. In your due diligence work, you want to check that every liability has been properly addressed, so that you don’t acquire additional risks with the property. And if there are problems, you want to resolve them. Yaakov: I want to understand contracts from a business perspective. I want to use language models to understand their language. It’s not a surprise anymore that language models can help with that. You put in part of a contract. You can ask a question. You can get an answer. What is problematic is that if you don’t do it right, these answers will not be consistent. You may ask the same question twice and get two different answers. Or you get an answer that is justifiable, but it’s not consistent with the intention of your question. It is against your business goals. Some answers may be completely disconnected from reality. The essence of this research project is how do we deal with this inconsistency? How do we create a reliable system where language models are just a part of it? We have knowledge graphs, we have logic, and we have humans in control … So that businesses can rely on it for their business needs. How do we detect and how do we deal with all the mistakes which happen with language models? But also in your data you have mistakes. And how do we detect them? How do we deal with them? That’s the research. Yaakov: In this research, I find that you really need to be careful to use the right paradigms of what you’re doing. If you just apply a playbook from machine learning or from expert systems and just try to do hard work, you may break yourself against language models being different. It’s a new technology with new characteristics. You need to adjust your goals, your paradigms of work, and the structure of your work to the capabilities and to the limitations of the problems of language models. One key point is to understand the right role that humans must play in the whole scheme. It’s not technology alone, it’s technology and people. We don’t need people in the loop. We need people in control. Larry: Okay. We actually had a long conversation last week to set this up and I was beginning to really get it, but that was an excellent summary of your research and the business insight and the technical challenge that comes into this. I’d love to really focus on how knowledge graphs figure into this. I’m inferring that there’s a business ontology and stuff that drives… But can you talk about the specific role of knowledge graph in how you see it helping address these new paradigms that we need? Yaakov: For sure. This is an excellent question, especially in the audience of Knowledge Graph Conference and knowledge graph podcast. I suggest, however, that before we talk about the knowledge graph, which is basically a technology which supports this work, to first understand: What does this really mean, human control? How does it work? What do we want to achieve? What is the goal? Then, let's understand the mistakes of language models. Their answers may be inconsistent with what you want. But they are not just making random mistakes. Where does this come from? And then we’ll see how knowledge graphs and logic play an important role. So maybe we go into three steps and get the knowledge graphs at the end. Larry: That sounds perfect, because that makes knowledge graph the culmination, which is what we’re looking for. It’s a knowledge graph podcast. But, no. So the first thing you said is, what does that really mean, humans in control? Yeah. I’d love to hear those three steps you just mentioned. Yaakov: Okay. Humans in control versus Humans in the loop. When you have a machine learning system, you know you need training data. When you get incorrect results, you want to double-check and correct them with a human in the loop. As a benefit, you create training data that will hopefully make your model more precise. But this usually doesn’t work. It works only when you do it right: When you do this with one human in control in addition to many humans in the loop. Without that, the people in the loop either don’t know the right answers, because they’re not experts. Or they are experts and don’t want to answer your question: you overload them with repetitive work,...
-
12
Michael Iantosca: Managing Dynamic Content with Knowledge Graphs – Episode 16
Michael Iantosca Where content, knowledge management, and AI converge, you'll find Michael Iantosca. As many in the AI world flock to probabilistic models like LLMs, Michael takes a deterministic approach to content management and knowledge engineering, using ontologies and knowledge graphs to ground content in a concrete facts. This approach embodies his insight that content and the models that describe it are not static information but rather valuable, ever-evolving enterprise IP assets. We talked about: his 44-year career in content, knowledge management and localization/globalization roles the three pillars of his work: content, knowledge management, and engineering the need he sees in his work to move away from probabilistic, vector-based models to deterministic, neuro-symbolic models like knowledge graphs how he decides which models are appropriate to use with each of the varied kinds of data he works with his explorations of how to automatically construct a knowledge graph to use to power generative AI solutions how he acquires and develops ontology skills in his team how graph technology supports the "total content experiences" he builds how the non-static nature of content makes it a poor candidate to be managed in a static system like a vector-based model the relative merits and utility of 1) deterministic retrieval for structured content and 2) probabilistic retrieval for unstructured content the power of combining content models, knowledge models, and ontologies and how they can become crucial enterprise IP assets his belief that we are entering a golden age of content and knowledge engineering Michael's bio Michael Iantosca is the Senior Director of Knowledge Platforms and Engineering at Avalara, a sales tax automation company. With over four decades of leadership in technical content management, Michael has been a pioneer in advancing the profession, driving innovations in structured content, intelligent authoring, and scalable knowledge platforms. Renowned for bridging engineering and content teams, he has championed the adoption of AI and cutting-edge technologies to enhance user experience. A thought leader and mentor, Michael continues to shape the future of technical communication through his expertise and passion for innovation. Connect with Michael online LinkedIn Medium ThinkingDocumentation Video Here’s the video version of our conversation: https://youtu.be/WG9Nl5OY3QI Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 16. A lot of work in the AI world these days is about vectorizing giant collections of static, unstructured content and data for LLMs. Michael Iantosca has worked for decades in a world where content is dynamic, always precisely structured, and contextualized with rich metadata. So he has a different take on architectural innovations like graph RAG, favoring knowledge-based deterministic retrieval of content over vector-based models and probabilistic methods. Interview transcript Larry: Hi, everyone. Welcome to episode number 16 of the knowledge graph Insights podcast. I am really delighted today to welcome to the show, Michael Iantosca. Michael is currently the Senior Director of Knowledge Platforms and Engineering at Avalara, the big tax-compliance automation software company. He's also got a long history. He's spent a couple of decades, a few decades at IBM prior to his role at Avalara. Welcome to the show, Michael. Tell the folks a little bit more about what you're up to these days. Michael: Larry, thank you for having me. It's a pleasure and an honor to get a few minutes to talk to you today. Yeah, I have just started my 44th year primarily in the professional content space, but also in the knowledge management and localization globalization space as well. I have been involved with content since the early days of SGML that began the structured content revolution and worked my way up through the professional content ranks. I'm also an engineer in IBM's Grand Wisdom. I was trained for years as a developer, so I do span both worlds. Michael: My responsibility is to develop some of the world's most advanced content supply chains for creating content for customers, delivering it, as we like to say, "Deliver the right content to the right person at the right time and in the right experience." That's my north star that drives that. I lead an engineering team that builds those platforms that services multiple groups throughout the enterprise for their content creation and delivery needs, whether that's in product, user assistance, contextual help, knowledge centers, help centers, support sites, and a litany of other channels by consolidating that entire supply chain, so that we can write content once and deliver it in many different channels, including chatbots, generative AI. Larry: Cool. I love that ... I think, I'm trying to remember only 16 episodes in, but I think you're the most content-ey person I've had on the knowledge graph Insights podcast. One of the things we talked about before we went on the air was this notion of you're like, "Hey, guys, dear engineers, it's not just about data, it's equally about knowledge." And in this world, the knowledge graph engineers and ontologists, they're on board with that, but you also bring this content perspective to it. I'm really curious how those three concerns combine in your work. Do you approach data stuff differently with a content lens on? And especially the knowledge management part, because each of those is its whole other thing, but you're working, your title, you're a knowledge platforms guy, you're combining all three of those. Tell me how that manifests in your work. Michael: Yes, that's critical. We see content and knowledge management, and when I say knowledge management, I'm talking more about taxonomies and ontologies and knowledge graphs and engineering, the actual coding and infrastructure of building out models and solutions in the AI, especially generative AI space, as a holy trio, if you will, that have to have equal footing. Developing really advanced generative AI solutions is not just a coding problem. It is equally as much of a knowledge management problem and equally as much of a content challenge. Michael: Content isn't generic. Content constantly is changing. The management of that constant is constantly changing. We can't treat content any more generically than we can treat data that changes daily, sometimes weekly. The state of that content, the purpose of that content is not static. We need the content teams involved because they have the very fuel of our generative AI solutions, but we also need the people that understand advanced semantic knowledge management that can help power both that content and make it intelligent and then feed that to the generative AI models, so that these models can be far better than they are today. Larry: Yeah, and when you say today, that's going to be way different even tomorrow from when we drop this. We were talking, again, before we went on about trends in the development that's developing around the implementation of graph technologies across all of this stuff, but in particular to content. I wonder if you could talk about how you see those trends. One of the ways I've seen is from LLMs to RAG to graph RAG and now, Tony Seale and folks talking about neurosymbolic loops and hybrid AI architectures. How was that unfolding in your world? Michael: That's a really good question too. I think almost everybody who starts out in the generative AI space follows the same basic path. About three and a half years ago, I think it was, it took us about a week to take a simple vector database, Pinecone, I think we used, and maybe a couple of hundred lines of Python code, and we built a RAG, retrieval augmented generation model, because we didn't want to use the general large language model that uses the public content, and we didn't want to feed our public content to train a large language model. It was natural that we wanted to have a private data model of our own and use a vector database to do that. But that's really an old, old model at this stage. It is a probabilistic retrieval model and therein lies its core weakness as well. What we wanted to do was move away from probabilistic ... oh, are you there? Larry: Yeah. Oh, do we have an internet thing? I lost you. Let me check my internet. Michael: I apologize. I had a burp in connectivity. Larry: Oh, no worries. Okay. Michael: Let me pick up again. Larry: Yeah. Go ahead. Michael: What we want to do is move away from probabilistic models like relying completely on vector-based retrieval and move toward deterministic models, sometimes what we call neuro-symbolic models and use mechanisms such as knowledge graphs, which are far better at providing true reasoning and true inference based on a concrete set of facts or what we call the ground truth. I think what you're seeing now in the marketplace is the initial models that are being deployed are good. They're yielding value, people are excited. They're not perfect, but as development teams reach those plateaus, they want to get better. They want better than where they are. This is what I call the precision paradox. The precision paradox says that as models improve, the tolerance for errors and lack of accuracy or relevance or contextual truth declines. We're partying right now with these models, and then, eventually that party is going to end and we're going to have to get down and do some of the serious work necessary to move to these deterministic, reliable models that are based on ground truth that we control, not that in LLM controls. Michael: That's what I see the trend going to. I think, every morning I wake up, I think I read at least 10 to 20 articles, all different models and variations. Unfortunately,...
-
11
Fran Alexander: Alien vs Predator and LLMs vs Knowledge Graphs – Episode 15
Fran Alexander When Fran Alexander looks at the current AI landscape she sees some interesting parallels between the Alien vs Predator science fiction franchise and the way RAG and other architectures are combining LLMs and knowledge graphs. We talked about: the analogy she draws between the Alien and Predator science fiction franchise with LLMs and knowledge graphs how the human-esque (if malevolent) cognitive and behavioral nature of Predators aligns more with knowledge graphs and how the unpredictable and stochastic nature of Aliens aligns more with LLMs how the eloquence of LLM outputs can deceive humans the lack of explainability and transparency in both Alien and LLM behavior, and the opposite in knowledge graphs the difficulty of dealing with baked-in biases in LLMs the lack of repeatability in LLMs and the opposite in KGs the current trend of architectures and practices like RAG that draw on the strengths of KGs and LLMs to get better results, just as the Alien and Predator media franchises combined forces how over the past year or so investment in LLMs has overshadowed all other investments, just as Aliens are out to wipe out anything that's not an Alien her approach to AI architectures that combine LLMs and knowledge graphs how different kinds of people consume LLM output how she helps enterprise decision makers choose whether to address a use case with a knowledge graph or an LLM how taxonomists and ontologists can use LLMs in their work the Alien Loves Predator UK Facebook group and Alien and Predator on a seesaw Alien and Predator cosplay actors on a seesaw Fran's bio Fran started her career as a writer and editor of dictionaries and thesauruses in the UK, and, as technology evolved, she specialised in information architecture, search systems, and digital archives, and more recently, the use of semantics in knowledge graphs and LLM applications. Having worked on reference publications including the Collins English Dictionary, and as Taxonomy Manager for the BBC Archive, she now lives in Montreal, Canada, and is the Senior Taxonomist for Expedia Group. She was Taxonomy Bootcamp London’s Taxonomy Practitioner of the Year 2023. Connect with Fran online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/VWwDBIws6G8 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 15. When two impressive domains converge, amazing things can happen. When the Alien and Predator science fiction franchises joined forces, both enjoyed new commercial success. Similarly, in the AI world right now, Fran Alexander sees knowledge graphs and large language models combining forces to create retrieval augmented generation and similar architectures that work together to create systems more useful and valuable than the sum of their individual capabilities. Interview transcript Larry: Hi everyone. Welcome to episode number 15 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Fran Alexander. Fran is an independent taxonomist and ontologist based in Montreal. And so, welcome Fran. Tell the folks a little bit more about what you're up to these days. Fran: Hi Larry. Well, it's nice to talk to you again. I really enjoyed talking to you on the previous podcast that we did a little while ago. And that one was kind of a bit of a general introduction to taxonomies, ontologies, thesauruses, knowledge modeling and semantics. But this time, I thought we could talk about knowledge graphs and LLMs. They're a big hot topic and I did a presentation earlier on in the year for Taxonomy Boot Camp London, Bite-sized Taxonomy Boot Camp London. That was a lot of fun and has been really popular. A lot of people have been asking me about it. I've revisited it a couple of times and that's LLMs versus knowledge graphs, Alien versus Predator. Larry: Okay. So why Alien versus Predator? Why not King Kong and Godzilla or...? Fran: I did think maybe King Kong versus Godzilla. Godzilla as LLMs and King Kong as knowledge graphs. Certainly, the idea is that it's a fun analogy. It's a fun way to start thinking about the differences between knowledge graphs and LLMs. It's not supposed to be a serious study of science fiction characters, but you certainly could pick your own pair of monsters and do the analogy there. I did consider, as I say, I did consider Godzilla versus King Kong. That would work, but personally, I happen to really like the Alien franchise. I'm a bit more familiar with the Alien franchise. I thought it was really, really fun to have the Alien versus Predator crossover and that's actually become quite a successful franchise in its own right. So yeah, so you could use many monsters, pick your own monsters and run your own analogy. But as a starting point for talking about LLMs, they can seem very Alien. They can seem very scary. So that was my starting point. Larry: Interesting. Yeah, and so I'm just trying to, I'm getting it as you talk about it. Well, tell me a little bit more about, because in ontology work, we figure out what the entities in the domain are and ascribe properties to them. What are the properties of an Alien that make you think they're like an LLM? Fran: So Aliens are very different from humans and one of the reasons why I like the Alien versus Predator analogy is in these characters, they're societies and the way they operate and what we know of them are very different. So Aliens, we don't really know much about Alien society. They're not at all like humans. They have acid for blood and what an Alien does is basically, just goes around killing everything in its path and making more Aliens. We can't really communicate with them, we don't know much about them. They're very, very different. Their approach to the universe is very, very different to ours. Fran: Whereas Predators, are still big, scary monsters, still very powerful, but they're much more humanoid. They kind of look humanoid. They move in a more humanoid way and Predators actually have a much more human-like society. So they have some kind of moral principles, maybe not that many. They're basically, they're mercenaries for hire, but they do have social structures, social hierarchies, complex societies. You can't go and hire an Alien to work for you in the way that you can hire a Predator. Fran: So Predators are starting to get into those kind of complexities, hierarchies. You talk about the way that we build knowledge graphs, they're usually very specific. So Predators will have a specific target. They're out to assassinate their designated target that they're doing for money. Knowledge graphs are very specific and targeted and precise. Whereas Aliens, they're not really involved in that at all. They're just going off to do their Alien thing in their own Alien way. So that was my starting point. You've got these two contrasting characters, one that's very, very strange and different. The acid for blood of Aliens, I compared to LLMs having maths for blood. Fran: The way that LLMs work in a kind of, and I'm not an expert and machine learning engineers and experts, I don't know whether they'd completely agree with my overview, but the way I look at how LLMs work is that, you take a big corpus of something - texts, documents, images - chop it up into lots of little pieces, and then you layer on lots and lots of algorithms and calculations and machine learning to figure out what the probability of one piece appearing next to another piece is. And that essentially is what LLMs are doing. They're very, very complicated probability engines. Whereas knowledge graphs are built up using structures and hierarchies and conceptual models that come from humans and come from people. Fran: Now I don't know anyone who learns by chopping things up and calculating probabilities of bits of text. Humans don't read books and learn from them like that and humans don't discuss and describe concepts and approach the world like that. But a human way of looking at the world is in thinking of things with structures and hierarchies that we're used to. Our taxonomies that are kind of like the backbone with knowledge graphs, that kind of parent-child broader narrower concept relationship is something very, very familiar to us. We talked last time about supermarkets being organized with the dairy section and milk within the dairy section and types of milk. That's very natural to us and we approach the world with a mental map or a mind map. Fran: It's very much like in ontology. So when you build a knowledge graph, you tend to start with subject matter specifics like the Predator having a specific target and a specific reason for going out after its targets, a specific motivation and you'll build up, you start to put your lists, your labels, your taxonomies, your ontologies. You're building them up from a very human perspective with your subject matter experts or your business drivers to come up with a knowledge base that answers specific questions in a focused and targeted way. So that's kind of where I started with the analogy. Larry: Yeah and especially the way you punctuated at the end there. But when you were talking earlier about the LLMs and instead of acid, they have math for blood, maths for blood. And that's so Alien and I think what's interesting to me, there's weird contrast there between, I agree with you that that's like a metaphor and an analogy that really works for me in feeling LLMs, but I think because of their conversational interface that they typically use, I think a lot of people perceive them as more, ascribe more humanity to them than they deserve. Does that make sense? Fran: Yeah, I think that makes sense. I think it's really interesting. I think it's one of the dangers actually and there are probably other sci-fi monsters....
-
10
Andreas Blumauer: The Elements of the Enterprise Semantic Layer – Episode 14
Andreas Blumauer Every enterprise nowadays is awash in data, content, and knowledge, the understanding of which is all too often available only in employees' heads. Forward-thinking businesses are moving to knowledge graphs to capture that tacit knowledge so that they can better understand and use it. Andreas Blumauer shows how those graphs work best when they're accompanied by a domain knowledge model, creating a "semantic layer" that provides a vivid map of your business knowledge. We talked about: the recent merger of his company, Semantic Web Company, with Ontotext to form a new company, Graphwise, which aims to help enterprises build their semantic layer the 20-year-old origin story behind his definition and description of a semantic layer the emerging trend he sees of data, content, and knowledge people coming together, often around explorations of language and content structure how getting domain knowledge out of people's heads and into multimodal AI architectures can streamline business research how a semantic layer provides a map of your business knowledge the importance of a domain knowledge model in a semantic layer architecture the inadequacy of "desktop data integration," the practice of calling colleagues, consulting varied sources, and otherwise searching through ad hoc enterprise knowledge sources, and the stress it can cause how a domain knowledge model can connect and reveal knowledge across an organization the surprisingly small footprint of the domain knowledge model in an enterprise knowledge graph, typically just 1% or so of the semantic layer how LLMs can help in the discovery of data and the construction of knowledge graphs the two main elements of the semantic layer: domain knowledge models (taxonomies, ontologies, etc.) and an automatically generated enterprise knowledge graph the different perceptions of the value of the semantic layer across data and content professionals, and how the arrival of gen AI has resulted in them talking together more to each other how the semantic layer can facilitate the alignment of vocabularies and the understanding of data across business divisions the benefits of a hybrid centralized-decentralized/global-local "glocalization" semantic layer strategy Andreas' bio Andreas Blumauer is SVP Growth at Graphwise, and CEO and founder of the Semantic Web Company (SWC), provider and developer of the PoolParty Semantic Platform and leading solution provider in the field of semantic AI and RAG. For more than 20 years, he has worked with more than 200 organizations worldwide to deliver AI and semantic search solutions, knowledge platforms, content hubs and related data modeling and integration services. Most recently, Andreas has been involved in the development of AI-powered ESG solutions for companies and investors. A globally recognized thought leader and author in the field of semantic AI and graph technologies, Andreas has helped define and implement knowledge and AI strategies for various industries and domains. He is the author of "The Knowledge Graph Cookbook: Recipes that Work", a practical guide to building and deploying knowledge graphs in organizations. He is passionate about enabling clients to harness the power of semantic AI and graph technologies to achieve their business goals and make a social contribution. Since 2022, Andreas has focused primarily on developing AI-based solutions that help organizations implement ESG and sustainable systems. Connect with Andreas online LinkedIn Graphwise Resources mentioned in this episode Knowledge Graphs, LLMs and Semantic AI LinkedIn group From Data to Trust: Leveraging Knowledge Graphs for Enterprise AI Solutions webinar Enterprise architecture model that includes a semantic layer Video Here’s the video version of our conversation: https://youtu.be/dEQh6m6zoDE Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 14. Data, content, and other business professionals have long known the benefits of sharing company lore in enterprise knowledge systems. Many of those systems now include a knowledge graph. Andreas Blumauer says that the thing that really lets you leverage your knowledge graph is a semantic layer that includes alongside your graph a human-crafted domain knowledge model that captures, and represents in a computer-readable way, the tacit knowledge in your enterprise. Interview transcript Larry: Hi everyone. Welcome to episode number 14 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Andreas Blumauer. Andreas, you may know him as the CEO and the founder of the Semantic Web Company, one of the venerable companies in the Semantic space. But more recently, his company has merged with Ontotext to form Graphwise, where he's now the SVP, the Senior Vice President of Growth and Marketing. So welcome Andreas, tell the folks a little bit more about what's going on these days. Andreas: Yeah, sure. Thanks Larry. Thanks for the invitation to this great podcast series. Yeah. So yeah, exciting days. Currently experiencing the merger between Ontotext and Semantic Web Company. And actually it has come quite naturally. So we have been working together for many years and inside of our Semantic suite, PoolParty Semantic Suite, we always have been using, at least for many years, GraphDB, the core product of Ontotext. So we got closer and closer, and finally the time has come to execute on this bigger vision, which is all about helping organizations create semantic layers. And this is where we currently stand, and it's definitely a very, very exciting time for all of us and looking forward to the next steps. Larry: Yeah, it's easily the biggest news in the industry lately, but yeah, the whole point of it is to support the semantic layer, and you're one of the first people to talk about that. I mean, if you Google it it shows up in a lot of different contexts. But in the context of these enterprise architectures that support knowledge representation tech, you're one of the first people to talk about that. I don't know how familiar people are or whether they need to have your notion of a semantic layer disambiguated from what they've got, but tell me a little bit about what you think of the semantic layer is. Andreas: Yeah, no, this has been a topic I always was, since the beginning of my career, quite a lot of discussions were around knowledge management and how to bring together digital assets to the actual knowledge people have in their heads, so to speak. And there was this, I would say, first wave of knowledge management discussions where it turned out to be quite clear that on the one side, at that time it was around 2005, six or so, rather let's say new digital community where documents contained data information, and the other side of the community said, it's not knowledge. So in a document, you will never find knowledge, it just can be translated or transformed into knowledge as soon as it has been recognized by human beings as such. And so I was always feeling between sitting between those two communities. Andreas: So on the one side was excited since day one. Okay, where does AI bring us? And on the other side, I was totally understanding, okay, we should not substitute human intelligence at all. So I was always in this kind of tried to bridge those two communities. And so I was thinking to myself, looking at the contemporary enterprise data architectures at those days there was a missing layer. There was something, there was the data layer, there were the document repositories, and on top there were the applications trying to represent some kind of business logics and nothing in between. So there was really a big, big hole. So I thought to myself, at that time there was topic maps around even still, and then RDF came up, semantic web standards, more and more semantic web standards got developed. So I was trying to introduce this new layer and they called it semantics layer at that time already, to the community. Andreas: And everybody stared at me as if I was an alien, really. So I felt like, okay, maybe I said something wrong, it doesn't make sense to the others. And I was a little bit puzzled by that, I thought, but this is still, I mean, and that really kept me going into this direction. I always thought to myself, there is something very important. And now I think we have arrived a new era where the data people, the content people and the knowledge people come together and now they discuss how should a semantic layer look like? And I totally understand each of those have different angles and different ideas how it should be set up. But let's take a closer look at it because it really depends on the use cases on top and the strategies of the AI, the overall AI strategy we all now want to develop driven by Gen AI. Obviously it has become even more important than ever before to have a semantic layer in place in our siloed world, in our siloed environments. Larry: Yeah. And that notion of the, as you were saying that you're making me realize that the LLMs are kind of at that, to my mind, there's emerging at the highest level that architecture, they're really a great interface to a lot of this knowledge that we have, but they're not the best keepers or understanders of that knowledge, whereas the RDF stack of knowledge representation stuff is. Can you maybe talk a little bit about the relationship between LLMs and other AI tech and knowledge graphs and how the semantic layer facilitates their interaction? Andreas: Right. I would start maybe like that. First of all, I think language is something which allows us to develop knowledge without our ability to talk and to use language. We're not able to develop anything more complex than let's say what we need for our survival mode typically....
-
9
Jessica Talisman: Using SKOS to Build Better Knowledge Systems – Episode 12
Jessica Talisman Jessica Talisman is a seasoned information architect with decades of experience across a variety of domains. She's done a lot of education and outreach around her semantic and and information architecture practices. One of the most important lessons she's learned is the crucial role of standards like the W3C SKOS model to bring structure and semantics to information and knowledge systems. Since there are never enough information architects in any organization, she supports the democratization of IA practices, but she's also quick to highlight the unique skills that you can only get with deep study. We talked about: her work as a senior information architect at Adobe and previously in GLAM (galleries, libraries, art, and museums) and other domains how her work in GLAM showed her the importance of the concept of lineage and attribution and benefits of the FRBR (Functional Requirements for Bibliographic Records) framework how standards and rules bring discipline and structure to information and data ecosystems how capturing knowledge via the SKOS standard can provide on its own the structure, semantics, and disambiguation your data needs, as well as set you up for future successes the importance of focusing on semantic fundamentals and how the ensuing understanding if your data assets can improve activities like a graph RAG implementation the importance of collaborating and sharing ideas across domains democratization, evangelism, and other kinds of information architecture outreach the "Golden Spike" railroad metaphor she uses to illustrate cross-functional collaboration challenges how linked data can help span organizational silos and align stakeholders on language and terminology the importance of understanding your unique organizational fingerprint how applying the library science concept of "scholarly communications" can move organizations forward and promote innovation Jessica's bio Jessica Talisman is a Senior Information Architect at Adobe. She has been building information systems to support human and machine information retrieval for more than 25 years. Jessica has worked in a variety of domains such as e-commerce, government, AdTech, EdTech and GLAM. Jessica holds a Masters in Library and Information Science with a concentration in Informatics. She lives in Santa Cruz, California with her partner Dave, and two dogs. Connect with Jessica online LinkedIn - Jessica is working on a new book about information architecture and is looking for anecdotes and other input. If you're an IA practitioner with good stories to share, she'd love to connect. Video Here’s the video version of our conversation: https://youtu.be/1tlrZTJ52Vs Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 12. Anyone who has tried to discern how people in a domain talk about the concepts in it, and then try to align stakeholders in an organization around those concepts and the words that describes them, and then share that information with computers so that you can scale the impact of your work, knows that you need a good system to manage your taxonomies and other terminology. Jessica Talisman argues that the W3C SKOS standard is your best friend in such endeavors. Interview transcript Larry: Okay. Hi everyone. Welcome to episode number 12 of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show Jessica Talisman. Jessica's currently a senior information architect at Adobe, but she is extremely experienced in information architecture and knowledge graph stuff, so welcome Jessica, tell the folks a little bit more about what you're up to these days. Jessica: Thanks, Larry. So I'm currently, as Larry said, a senior information architect at Adobe, and before this, I was information architect over at Amazon. I've worked in many different domain spaces, but my original foundational experience is in information and library science and I worked as an academic librarian in the past with books, museum galleries, library, art, and that really gave me the foundational experience and information and knowledge necessary to do my job. Larry: Nice. We were talking before we went on the air about how, especially your art and gallery and that kind of curation, tell me a little bit about how that ... It seems like that was really foundational to your current practice or it's been influential. Tell me why. Is it there's something about the nature of those collections or what's going on there? Jessica: One of the most critical things when working in that space in galleries, libraries, art. It's called GLAM, which is actually a great acronym for galleries, library, art and museum, and within the GLAM space, provenance and lineage is something that cannot be ignored because you're dealing with works, whether it be print books, art obviously, an attribution to the artist or writer and that's instrumental to building information in library spaces within that domain space. And so that taught me a lot about linking records. There's a super interesting framework that from the library and information science background, it's a framework called FRBR, which is Functional Requirements for Bibliographic Records. And the whole idea is to maintain the manifestation and expression of works, which is the lineage of a piece of art or a book and being able to support proper provenance and attribution of works while maintaining lineage for the benefit of people and machines. Larry: Nice. And it's so clear when you think about how LLMs work and the current state of AI, it's clear for the need for that kind of thing. Well, and that's kind of jumping ahead a little bit too to what you do with this stuff after you've got it all organized. I think in almost every case, whether you're like a data scientist, an information architect, content strategist, a museum curator, whatever, you start with this pile of concepts and words and then people like you turn them in to something useful to both humans and computers. How does that look? Walk us through the top level overview of that. Jessica: So normally no matter what, when I come into an organization or any sort of information environment or data-rich environment, there's usually problems, identified problems. And those usually are not the only problems within an ecosystem, an information or data ecosystem. And so the idea of nomenclature or vocabularies is instrumental. It's the foundation of how we discover and find things within that ecosystem. And so having to look at the current vocabulary and the current environment of how words are implemented, that starting point in understanding culturally and otherwise, how words are used and vocabularies used to not only define a domain space but to help support information discovery and retrieval within a space. Larry: Yeah, because that's the classic ... I mean I think the way most people would think of, they don't know anything else about information architecture, it's about discovery and discoverability and findability. But there's sort of levels to that. I work more in the UX world and there you're like, here's how we're going to talk about things. With content designers and UX writers, that's a common thing. So you just end up with a controlled vocabulary or something like that where you talk about it, but there's sort of a progression from that level all the way out to a full-blown ontology. How does that escalation happen I guess from when does it become clear that you need, "Well we really need to define these things," so I guess that's a glossary right now? Jessica: Well, in looking at that environment, so you have a series or collections and I like to use the word collections of vocabularies. Often they occur or exist as flat lists and there are usually internal belief systems that determine how these vocabularies are structured and implemented. Some are closely guarded. I will give the forewarning that within these spaces sometimes branding and marketing will be also pretty protective over the vocabularies used, but not understanding that these are meant to work on the back end of systems to help facilitate information retrieval and discovery. And so when building for back end systems, the lowest common denominator aside from a flat list or controlled vocabulary is to use something like an ontology like SKOS, which is Simple Knowledge Ontology System. It's a very simple ontology that helps to structure a hierarchy and simple relationships that are machine-readable. And it's something that that one standard and lightweight or upper ontology, there's the characteristics that are included with that, but it's super machine-readable and translatable, but it's also human-readable, which is what's really critical about that structure. Jessica: And so you can not only define parent-child relationships, alt labels or aliases and encode those with ontological labels, but there's rule bases, there's standards included with that which lines up with several other standards that exist for information retrieval on the World Wide Web. So having to structure and include those rule bases that are standards-based as well helps to enforce a certain type of discipline and structure within that information ecosystem and data ecosystem, which helps to guide people towards best practices and not only why hierarchies or taxonomies are important. But the introduction of concepts like thesauri or thesaurus, you can actually build a thesaurus using SKOS. So whether people realize it or not, that actually helps to shape a thesaurus. And then you can also have a very sort of primitive knowledge graph but still a knowledge graph using SKOS. So it's like a nice little primer and entry to the world of structuring data and information to be disambiguated. So you naturally go through a disambiguation process and a struc
-
8
Tony Seale: The Knowledge Graph Guy – Episode 11
Tony Seale With ten years of semantic data experience and an endless stream of insightful posts on LinkedIn, Tony Seale has earned the moniker "The Knowledge Graph Guy." Tony says there's precious little time for enterprises to prepare their data with the interconnectedness and semantic meaning that it needs to be ready for the coming wave of more powerful AI technology. We talked about: his 10-year history of applying academic knowledge graph insights to commercial work, mostly in the finance industry the yin-yang relationship in his "neuro-symbolic loop" concept that connects creative, generative LLMs and the reliable, structured knowledge provided by knowledge graphs the contrast in reasoning capabilities between LLMs and knowledge graphs how neither formal logic nor probabilistic systems are rarely the right answer on their own, hence the yin-yang analogy the crucial role of understanding and consolidating data, the gold mine on which every enterprise is sitting that describes any organization's unique value the power of understanding the "ontological core" of your business and then projecting it, selectively and strategically, to the world the urgent threat posed by snake oil salesmen and other opportunists coming into the graph world and derailing enterprises' chances to properly exploit their unique data advantage the two crucial characteristics of AI-ready data: connectedness and semantic meaning his work chairing the Data Product Ontology (DPROD) working group, an effort to provide a semantic definition of what a data product is Tony's bio For over a decade, Tony has been passionate about linking data. His creative vision for integrating Large Language Models (LLMs) and Knowledge Graphs in large organisations has gained widespread attention, particularly through his popular weekly LinkedIn posts, earning him the reputation of 'The Knowledge Graph Guy.' Tony’s journey into AI and knowledge graphs began as a secret side project, working from a computer under his desk while employed at an investment bank. What started as a personal passion quickly evolved into an area of deep expertise. Over the past decade, Tony has successfully delivered several mission-critical Knowledge Graphs into production for Tier 1 investment banks, helping these institutions better organise and leverage their data. Now, Tony has just founded The Knowledge Graph Guys, a brand-new consultancy dedicated to making knowledge graphs accessible to organisations of all sizes. Through this venture, he aims to empower businesses with the tools and strategies needed to harness this powerful technology. Connect with Tony online LinkedIn The Knowledge Graph Guys Resources mentioned in this podcast Connected Data London conference Knowledge Graph Conference GraphGeeks podcast DPROD working group Video Here’s the video version of our conversation: https://youtu.be/lkNvCzwhTRY Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 11. Whether they realize it or not, every business on the planet is sitting on a gold mine, the precious data that uniquely positions them in their industry and market. With ten years of AI practice and an endless stream of insightful social media posts, Tony Seale has earned the moniker "The Knowledge Graph Guy." Tony argues that enterprises that fail to grasp the urgent need to consolidate and understand their data will not survive the coming wave of more powerful AI. Interview transcript Larry: Hi, everyone. Welcome to episode number 11 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Tony Seale. Tony is the Knowledge Graph Guy. That's how everybody knows him on LinkedIn, and I think he's earned that moniker. He also does a lot of consulting and work for big investment banks and things in the financial world. But welcome, Tony. Tell the folks a little bit more about what you're up to these days. Tony: Hi, Larry. Thanks for having me on here. Yeah, so as I was saying, I've basically been doing knowledge graphs now for the last 10 years, largely within investment bank, investment, large investment banks. So I'm kind of at the rubber meets the road. Well, that's where I have been. So taking the technology that's been largely in the academic space and actually applying that into production settings. And then I've started trying to share that knowledge and I've become obsessed with the idea of what it would be to connect an organization together. So what would be possible if most of the information within a given organization was connected together? And I'm on a mission to push that forward. As you say, I've just started doing a consultancy to try and accelerate that effort. The last really, I guess, two and a bit years now have been focused really on the space between large language models and knowledge graphs. And hopefully we can talk a bit about that. Larry: Yeah, because I attended the Semantics Conference last week and that's pretty much all anybody was talking about. I mean, there was other conversations of course, but you have a take, and I think one of the things that is becoming clear in my mind is this sort of evolution from just math and LLMs to graph to RAG to graph RAG. And then you have this concept of the neuro-symbolic loop which, is that the evolution of the integration of these technologies? Tell us some more about that. Tony: Yeah, so I guess maybe to frame it at a conceptual level, you can think of the large language models existing in this continuous space. So they're slightly fuzzy, they're probabilistic, they're generative. So they're always guessing at what the next right thing to do should be. They're not structured. And that has within it a huge amount of power because they can explore different pathways. They can be in at least some limited sense of the word creative and imaginative. They can write poetry and do everything that we are familiar with them doing. But you could sort of contrast that type of intelligence with I guess what you would call the existing and what people will rather derogatorily call good old-fashioned AI. The existing approach, which is sort of formal deductive logic and formal reasoning. And really that's what knowledge graphs represent from a data side, like the most flexible structure for doing reasoning over your data. Tony: So what then becomes interesting, it's like, well, can large language models actually do that formal reasoning? And obviously the big AI houses are trying really hard in order to make that happen. They're chucking a lot of money into making that happen. But I think to a certain extent, maybe one day they will kind of get so close that we won't know the difference between it. But to a certain extent it's just a kind of different paradigm, if you like. There an uncertainty there within a generative model, which is just very different from what deductive reasoning is going to be. Tony: So the idea of the neuro-symbolic loop is to try to bring these two systems like a yin and yang, bring the two systems in together closely. So that as close as possible and as lower grain possible level, you are looping between this system one and system two. So system one being the large language model, being a bit creative, being generative, very, very quick. And then the system two being the formal representation of the system in which you working through steps, being able to do structured querying, always getting back reliable results. Larry: I love the visual of the yin and yang symbol as the representation of that tightening loop, because that's a perfect way to look at that. But back to, I want to revisit the notion of reasoning a little bit because you mentioned that and that seems really important. And a lot of the AI fans on LinkedIn these days and the big companies themselves of course have been talking a lot about the reasoning that they can do. Can you contrast that? I heard it referred to at Semantics as pseudo-reasoning, versus the real reasoning that you just mentioned, that deductive logic based systems can bring. Can you talk a little bit more about that, the relative reasoning capabilities and whether... Are they just faking it till they make it or what's going on there? Tony: Yeah, so I mean you're always working in analogies with this stuff because at the end of the day, nobody, not even the people who are very close to this truly understand what's going on inside of a large language model. It's a bit of a mysterious thing what's happening in there. But here's one way of conceptualizing it, that effectively it's kind of doing not a database lookup, but almost like that. This very sophisticated lookup of its training data. So it's seen a huge number of samples of training data. It's able to a certain extent, it's kind of mapped all of that kind of training information as you're doing the different layers within the large language model. It's compressed some of the concepts in there, not in a way that we would necessarily be able to understand in this kind of vector representation of it. But so it's able to use that to go and retrieve these kind of patterns, and to a certain extent sort of merge them together. Tony: So what recently has been done with the kind of Q-star stroke, Strawberry stroke 01, and really all of the others are doing the same thing as well, is that you will take some domain where you have right and wrong answers, for instance coding or mathematics. And then what you will do is you'll get the large language model to simulate loads of different answers to that and then have its kind of reasoning steps of how it got to that particular answer. And then you'll go out to an external verifier, which in the coding thing it's like, "Well, okay, run this unit test. Does the unit test pass or with the maths? Okay well,...
-
7
Paco Nathan: Graph Thinking to Better Understand Graph RAG – Episode 10
Paco Nathan Graph RAG is all the rage right now in the AI world. Paco Nathan is uniquely positioned to help the industry understand and contextualize this new technology. Paco currently leads a knowledge graph practice at an AI startup, and he has been immersed in the AI community for more than 40 years. His broad and deep understanding of the tech and business terrain, along with his "graph thinking" approach, provides executives and other decision makers a clear view of terrain that is often obfuscated by less experienced and knowledgeable advisors. We talked about: his work building out the knowledge graph practice at Senzing, and their focus on entity resolution the importance of entity resolution in knowledge graph use cases like fraud detection the high percentage of knowledge graph projects that we never hear about because of their sensitive or proprietary nature his take on the concept of "graph thinking" and how he and colleagues illustrate it with a simple graph model of a medieval village how graphs add structure and context to our understanding of the world the importance of embracing complexity and the Cynefin framework in which he grounds various types of business challenges: simple, complicated, complex, and chaotic how to apply insights discerned from a Cynefin framing in management how knowledge graphs can help oranizations understand the complex environments in which they operate the wide range of industries and government entities that are applying knowledge graphs to concerns like supply chains, ESG, etc. his overview of RAG - retrieval augmented generation and graph RAG the wide variety of uses of the term "graph" in the current technology landscape Microsoft's graph RAG which uses NetworkX inside their graph RAG library, not a graph database Neo4j's approach which creates a "lexical graph" based an an NLP analysis of text "embedding graphs" ontology-based graphs Google's approach to RAG, using graph neural networks graphs that do reasoning over LLM-created facts assertions "graph of thought" graphs based on chain-of-prompt thinking "causal graphs" that permit causal reasoning "graph analytics" graphs that re-rank possible answers the evolution of graph RAG libraries and the variety of design patterns they employ the shift in discovery dominance from search to recommender systems, most of which use knowledge graphs examples of graph RAG from LlamaIndex and LangChain, in addition to Microsoft's graph RAG his prediction that we'll see more reinforcement learning, graph tech, and advanced math capabilities like causality in addition to LLMs in AI systems his reflection on his efforts to advance graph thinking over the past 4 years and the current state of LLMs, graphs, graph RAG, and the open-source software community the need for a shift in thinking in the industry, in particular the need for cross-pollination across tech proficiencies and enterprise teams the "10:1 ratio for the number of graph RAG experts versus the number of people we've actually worked with a library" Paco's bio Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics. Paco advises Kurve.ai, EmergentMethods.ai, KungFu.ai, DataSpartan, and Argilla.io (acq. Hugging Face), and is lead committer for the pytextrank and kglab open source projects. Formerly: Director of Learning Group at O'Reilly Media; and Director of Community Evangelism at Databricks. Connect with Paco online LinkedIn Sessionize Derwen.ai Senzing.com Resources mentioned in this interview Connected Data London conference Knowledge Graph Conference GraphGeeks community REALM: Retrieval-Augmented Language Model Pre-Training, Guu, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis, et al. NebulaGraph Launches Industry-First Graph RAG: Retrieval-Augmented Generation with LLM Based on Knowledge Graphs Graph Retrieval-Augmented Generation: A Survey, Peng, et al. Video Here’s the video version of our conversation: https://youtu.be/4pmV6BUSKmY Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 10. As enterprises and tech companies have looked to ground in factual knowledge the answers that their LLMs deliver, graph RAG architectures and products have sprung to the fore. With his deep background in Silicon Valley culture, the open-source software community, artificial intelligence practice, and knowledge graphs and semantic technology, Paco Nathan is one of the best-positioned people in the industry to help us understand the current state of graph RAG. Interview transcript xLarry: Hi, everyone. Welcome to episode number 10 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Paco Nathan. Paco is the principal DevRel engineer for knowledge graphs at Senzing, the big company that does entity resolution for a large-scale mission-critical applications, really fancy high-end graph stuff. So, welcome, Paco. Tell the folks a little bit more about what you're up to these days. Paco: Thank you very kindly, Larry. I appreciate. Yeah, I'm over at Senzing. Actually, I was presenting a master class about Senzing integrations at the Knowledge Graph conference last time we saw each other in Manhattan and then joined the company shortly thereafter. Paco: I'm building out the knowledge graph practice area because we do... I'm with this team that has been doing work for many years in entity resolution and most people have probably never heard of it, but most people have probably used it. So, the idea is, say you have a bunch of different tables or data sets and you want to try to find what are the consistent entities inside these tables. Paco: So, you might have Bob R. Smith and Bob Smith Jr. and they're both at 101 Main, but one of them is spelled 101 Main Street and the other is maybe spelled a different way or abbreviated a different way. And if you can think about that kind of problem, but spanning across billions of records in a lot of different data sources, how can you pull out the consistent entities? Paco: And it sounds like a trivial data science problem. We could just use string distance, Levenshtein distance, which is a typical thing. But when you take into account the fact of, well, what if you've got Bob R. Smith at 101 Main, Bob R. Smith Jr., but then you get Bob R. Smith Sr. at 101 Main, and they've both got voter registration. Is that the same person? Paco: Because your Levenshtein distance will tell you it is. If you set a threshold on string distance, they'll tell you they're the same person. So, when they try to register for vote, one of them will be denied voting rights. And so, this problem becomes very much complicated when you're working in a world where there are companies that have offshore subsidiaries and maybe you don't know the actual owners. Paco: You might know some of the directors and you get a very tangled web of some very bad people who are moving a lot of money around to do very bad things offshore, sorry, illegal fishing, illegal lumber, overthrowing democracies in Asia or in North America, for that matter. Basically, when you get the problem of trying to understand who's who and what's what, and a lot of different people or companies or ships that might have a registry somewhere, but you don't know exactly in a given business context who they are, how can you triangulate on them? Paco: And so, it's typically not a matter of just a string distance, it's a matter of, well, I have enough elements of their address that are in common even though there are five different ways to represent this address in Singapore. I can tell the difference between a company at the same address or a hundred companies that are in the same shopping mall, which actually in Singapore is really a hard problem to understand. Paco: And same thing for tax records or passport control. There's an area called UBO, which is ultimate beneficial owner, has a lot to do with sanctions compliance and catching oligarchs and understanding who is trying to do money laundering in an offshore tax haven, who is funneling billions of dollars out of Kremlin assets to try to influence a campaign somewhere. These are the kind of problems we work with. Paco: And so, the long and short is that these are... If you look at any episode of Homeland or The Wire or NCIS, any crime drama, inevitably, the protagonist goes up to a wall and they've got pincushion, they've got all these clippings and photos and notes, and they take yarn and draw a graph between them. And the thing is, the people who do that real work, if you're in the US, you're talking about three-letter agencies. If you're in the UK, you're talking about four-letter agencies. Paco: But the people who really do that work 24/7, they actually use knowledge graphs. They use collaborative knowledge graph tools like Aptitude Global, SiReN, GraphAware, Linkurious, Esri, ArcGIS Knowledge, Kineviz. There's a bunch of different tools that allow people to collaborate on building knowledge graphs to catch bad guys. Paco: In finance, we have acronyms like AML, anti-money laundering, or UBO, ultimate beneficial owner, or PEP, politically exposed persons. All of these things have to do with the fact that somebody has committed very large-scale crimes and governments have reacted by saying, "Okay, regulatory, we will not allow this to happen again." So, you end up having data sets like LIFE, was a multi-government response to the problems of 2009 global financial crisis. ...
-
6
Katariina Kari: Building Knowledge Graphs for E-Commerce Giants – Episode 9
Katariina Kari For the past eight years, Katariina Kari has built knowledge graph teams at giant e-commerce companies like IKEA and Zalando. This practical, real-world experience puts her in an elite group of ontology and knowledge graph experts. Knowledge graphs offer unique benefits to e-commerce merchants. From better product recommendations to more useful search results, the semantic capabilities that knowledge graphs provide routinely result in seven-figure sales increases. The knowledge graphs that Katariina builds provide a semantic layer in the enterprise architecture that lets companies capture, use, and re-use the organization's unique domain knowledge in any number of applications. Because knowledge graph isn't one application that does one thing. It's a paradigm shift in the way we work with data. It's a paradigm shift in the way we code, because now you don't need to put business logic into your code. We talked about: her work over the past eight years building knowledge graphs at companies like IKEA and Zalando how using knowledge graphs to improve reccomendation and search routinely brings seven-figure business benefits the set of skills and talents it takes to implement a knowledge graph project, most of which already exist in most companies how LLMs and other AI tools can help transform structured or unstructured data into semantic data, a computable resource that captures business domain knowledge some of the specific skills needed for KG work: ontology experts, back-end developers who understand the semantic web stack, data scientists and engineers, knowledge practitioners to capture domain knowledge, and product management the need in each organization for a unique knowledge graph team tailored to the needs of the company and the talent available the importance of user-centricity and use-case understanding in any knowledge graph project the benefits of capturing business logic in a semantic layer which can be used and re-used in multiple applications an interesting search-improvement use case that resulted in seven-figure sales increases, as well as experience-improving recommendation and info-box use cases how capturing subject matter expertise in a knowlege graph can dramatically improve recommendation systems and deliver unexpected benefits to other the importance of showing the benefits of knowledge graphs to organically advance enterprise adoption her take on the difference between RDF-based knowledge graphs and labeled property graphs (LPGs) like Neo4j the compelling case for knowledge graphs in e-commerce, which she has discovered in her eight years of practice Katariina's bio Katariina Kari is a leading expert in semantic web technologies, specializing in the development of ontologies and knowledge graphs. Over the past eight years, she has worked with prominent brands like IKEA and Zalando, building knowledge graphs that significantly enhance customer experiences by improving search functionalities and recommendations. Her extensive hands-on experience in creating enterprise knowledge graphs has established her as one of the global top talents in the field. Katariina is frequently invited to speak at international events on the semantic web and knowledge graphs, sharing her insights and practical expertise with industry professionals. Her deep knowledge and passion for the semantic web have made her a sought-after keynote speaker and thought leader in the field. Balancing a dual enthusiasm for technology and the arts, Katariina holds both a Master of Science degree and a Master of Music degree. From 2012 to 2016, she ran her own consultancy, where she worked closely with classical music organizations and artists, helping them navigate digital outreach. An art-loving and art-serving nerd, she seamlessly blends her love for music and technology in all her work, constantly pushing the boundaries of what’s possible in her field. Connect with Katariina online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/gMFAHlY7VL0 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number nine. One of the main benefits of semantic technology is the ability to sort out business logic independent of data and data from the applications in which it's used. Katariina Kari has captured in knowledge graphs the business expertise of e-commerce giants like IKEA and Zalando to power better search and recommendation systems and to generally provide a better experience for both internal users and external customers, resulting in millions of dollars in new sales. Interview transcript Larry: Hi, everyone. Welcome to episode number nine of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show Katariina Kari. Katariina is a long-time, deeply embedded in the community knowledge graph professional. She's done a lot of e-commerce work at places like IKEA and Zalando. She's currently the head of data at a stealth internet, a stealth startup that we can't talk too much about. But welcome, Katariina. Tell the folks a little bit more about what you're up to these days. Katariina: Yeah, thank you Larry, and thank you for having me in your podcast. Yeah, I could say that I've been really lucky to have worked in the industries and especially in the lifestyle sector, e-commerce sector for the past eight years. So very early on, before even graph databases were really commercialized, I had the opportunity to start building knowledge graphs. And so now, when there's someone out there was like, "Oh, I need a knowledge graph," I can confidently say, "Well, I've done it a few times so I can tell you, I can advise you or I can even run a team for you that can build the knowledge graph." And it's just shown me a lot of practical things, it's given me a really good perspective on what works from theory and from research and what actually doesn't, or doesn't yet work, or isn't mature enough yet for an applied setting in commercial use. Larry: That point you just brought up, you have a PhD in something, right? Katariina: No, no, I don't have a PhD. I never really went into research. I have a few published articles, scientific articles that I did towards the end of my master's, but I actually just have two master's. I have a master's in technology and then I have a master's in music arts management, because I've always carried this love for both art as well as technology, and I always wanted to combine them. Larry: I didn't plan it this way, but the episode right before this one, number eight, was Vera Brozzoni, who's a metadata strategist at the BBC, and she comes out of classical music. So if you don't know Vera, you two have to meet and talk music and data and stuff. But one of the things you mentioned there is one of the things in this community is, and the reason I assumed, it's usually safe to assume in this world that somebody has a PhD, but you've always been more focused on the practice side than the research side, which is awesome, because I'm all about sharing practice, so thank you for being focused that way. And you have all this experience coming up on, what, eight years of experience at IKEA and Zalando, two of the biggest retail brands in the country, or in the world. Larry: And one of the things, and I want to gently take you to task for something in a talk I saw you do recently, this hour-long talk, brilliant stuff about a lot of this stuff we'll talk about today, but right in the middle of that talk, you just kind of matter-of-factly mentioned like, "Yeah, and we're realizing seven-figure business benefits across this." I'm like, "Wait, what?" And in my journalism training, we would call that burying the lede. But that's kind of at a top level... There's real obvious business benefits to adopting knowledge graph technology. Can you talk about what is it that's unique about the work you've done and this technology that permits these massive revenue gains? Katariina: I would say that one part of it is the work, but the other part of it is working with big brands like Zalando, Europe's biggest e-commerce fashion, and then IKEA, one of the most known trademarks or brands in the world. So their volumes in e-commerce are huge. So if you add a positive change to the customer experience, like giving quality recommendations or just improving a few of the worst-performing search terms, you get a lot of... The volume is so big, the benefit is really big, and it's already in that category of seven-figure sums. So that's why I think maybe not every little e-commerce site can invest in this technology first. It's great that these big brands are actually investing in so we can figure out exactly how to do it, and then that can be brought to maybe a smaller-volume e-commerce setting or smaller-volume industry so that they can then make sense of these best practices. That's at least the way I see it. But yeah, I mean just being able to improve a big website's performance, just doing little optimization is already moving the needle quite a lot. Larry: Right. And once you've articulated those best practices, you can picture it, smaller businesses benefiting from it. But right now, it takes quite a team to put this together. I've heard you talk a lot about the skills that it takes, the roles that you need to execute on those skills, and then the human element, the thing that our friend Ashleigh Faith calls the data therapy part of this. Can you talk a little bit, I guess, first, about what does it take, what are the skills that you need, the knowledge, the wherewithal in your organization to actually make a knowledge graph project like the ones you've worked on happen? Katariina: Well, when you work with these industries in e-commerce, they'll probably have already very brilliant backend developers,...
-
5
Vera Brozzoni: Managing Classical Music Metadata at the BBC – Episode 8
Vera Brozzoni When you manage millions of digital assets, as the BBC does, you need robust metadata practices to organize and discover them. Vera Brozzoni is a metadata manager at the BBC who focuses on classical music. She combines here academic background in music, philosophy, and the humanities with a rigorous metadata mind to help BBC systems - and ultimately viewers and listeners - discover and appreciate the music she loves so much. We talked about: her work as a metadata manager at the BBC and her distinctive background in classical music and philosophy and the humanities how the complex history of music complicates her work the role of taxonomy in her work the meaning behind the famous quote, "Metadata is a love letter to the future" how "music does whatever it wants" just as biological organisms don't always follow predictable rules the importance of not being present-bound and imposing current biases on prior generations of music her thoughts on the need for more practitioners with artistic cultural backgrounds to enter the field of metadata management the diverse variety of intellectual talent at the BBC how she sees her role as "bridging two completely different universes" her thoughts on how AI could benefit her metadata work her metadata outreach work into the music community how to measure the effectiveness of a metadata program her belief that the phantom of the French philosopher Blaise Pascal hovers over all cultural metadata work Vera's bio Vera Brozzoni was born in Italy where she studied Philosophy. She then moved to the UK where she studied History of Music and obtained a PhD in Composition at Newcastle University. She has worked in the music industry for many years, specialising in classical music metadata, devising innovative methods of schematising the history of music in all its complexities. Her aim is to evangelise metadata to classical music companies so that they can future-proof data coming from the past. Her other interests include AI, Machine Learning, cinema, literature. Connect with Vera online LinkedIn Video Here’s the video version of our conversation: https://www.youtube.com/watch?v=0MKWKw0x7uc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 8. Adjacent to the engineering and information practices that build the semantic infrastructure we operate in, is the crucial field of metadata management. Vera Brozzoni agrees with the internet archivist Jason Scott that "metadata is a love letter to the future." In her work at the BBC, Vera combines her deep academic background in music and the humanities with her metadata expertise to help listeners and viewers discover and appreciate classical music. Interview transcript Larry: Hi everyone. Welcome to episode number eight of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Vera Brozzoni. Vera is a metadata manager at the BBC and based in the company in the UK. Welcome, Vera. Tell the folks a little bit more about your work at BBC and ... Vera: Hello Larry. So, yeah, as you rightly introduced me, Larry, I'm a metadata manager. My particularity is that I'm a classical music specialist and I come from a staunchly humanities background and not a tech or library science background like most of my colleagues basically. Vera: The fact of being an anomaly in this field, I'm trying to use it for the good of culture, for the good of the arts, and trying to convince the classical music world of the importance of metadata, which is great fun as you can imagine. Larry: That's funny. I hadn't really thought about that side of it 'cause I've thought more like you, we met in London at a semantic event and you fit right in. It's like you're clearly not having any trouble integrating into the tech world and the media world, but that's interesting. Tell me more about getting your classical music colleagues on board with metadata. Vera: Well, the reason why I didn't feel like a fish out of water is because I have a degree in philosophy. Philosophy is not the abstract art of building universes that most people think. Philosophy is a very rigorous science in itself with a very specific jargon. And when I entered the world of metadata, I found out to my own surprise that a large part of that jargon was in common with philosophy. So I felt, "Oh, I actually understand this." And I didn't expect it, but I immediately felt, well, this is a way to put my analytical brain at use, not in philosophy, but in the real world, which is what philosophers would have liked anyway. Vera: So this was coupled with my passion and my knowledge about classical music, which is something that has accompanied me forever. I actually have a master's degree and a PhD in music. So yeah, I have my credentials. And I started to work for Universal Music some 10 years ago, and it was immediately clear that I had the right knowledge and the right expertise to make an imprint in the field of classical metadata. Vera: Classical music metadata must be said is an incredible mess because the history of music in itself is extremely complex. Musicians and composers have never thought in terms of metadata, like, "Oh, let's make this title really clear," or, "Let's make this data hierarchy very clear so that 300 years down the line, someone will be able to catalog my music very easily." No, this has never happened. Larry: So the whole classical music industry has conspired to make your job as hard as possible. Vera: Exactly, yes, and I enjoyed it. Larry: Nice. Well, tell me a little bit about that. So I sometimes think of some of my work as archeology, and what you just described is going back into classical, and classical music is probably, at least in terms of the western arts, it's as deep and established as anything. So there is a tradition there, like an intellectual and some kind of scholarly tradition. But metadata strategy wasn't part of it. So you're kind of putting the metadata on after the fact. Is that kind of how it works? Vera: Yeah, exactly. I liken it to Charles Darwin or Carlo Linneo (Linnaeus) or the old catalogers of living being, all the taxonomists of the past centuries. And again, I have used the word taxonomy, which is something that we currently use in metadata, but its root is actually in natural sciences. Vera: It's about trying to put a cloak of order onto something that is inherently chaotic. In the well-knowledge that order will never cover the 100% of what you're trying to do. There will always be something that is irreducible and un-catalog-able if you have this word in English. Larry: It's a word now. If it wasn't before, I'll take it. Vera: Yeah, exactly. I coined it. Larry: No, that's super interesting, that history and the interplay between disciplines, because I know enough about taxonomy, know that it arises out of Linnaeus' work in biology and that attempt, and it's sort of like, so it's more about, we think of taxonomy as not imposing order on things, but it's really more about not imposing, but ascribing order, ascribing characteristics to things so that we can organize them better. Is that sort of how you picture taxonomy in your work, is that- Vera: I believe there is a sense of hubris, there is a sense of we try to make things easier for ourselves to read. However, it's not just a matter of us modern people being arrogant and being cultural freaks, let's say. It's also a way to make sure that this wealth of knowledge, this wealth of culture and art will be readable in the future. I think you must be familiar with the famous quote, "Metadata is a love letter to the future," by Jason Scott I think. And I fully believe in that. I am quite an idealist in this sense. I do believe that my work in metadata, no matter how gray or boring it might look to someone who is not in the field, is actually extremely important. Vera: Now, the problem with classical music in particular is that it's a world that it's firmly steeped in the past and it's very difficult for them to think about the present, let alone the future. So I feel that very often my work is about bridging these three time dimensions, past, present, and future and try to make people understand, look, what I'm doing here is really, really important because obviously it's about cataloging, it's about archiving, it's about making sure that the people who will come will find a legacy of music that is discoverable. Larry: That notion of discoverability, that's sort of the whole point of, or one of the main points of metadata. I know there's a lot of other uses for it, but that notion. You mentioned a minute ago, you used the word chaotic I think to describe the heritage that you're working with. But if you go to the BBC website, it's anything but chaotic. It looks like, "Oh, this is very tidy and organized," and that's just the result of your work. I guess part of your work is too, because you love classical music and you want to portray it accurately. Does anything get lost in the tidying up of things? Vera: Well, of course, of course it does. But even when you say that the BBC website is all nice and neat, you don't see behind the scenes. Larry: Okay. Well, I haven't been invited in yet to look around. Vera: But yeah, in the field of music, yes, obviously there are things that don't follow. Just like in biological evolution, there are species of animals, there are creatures that don't follow any of the superimposed rules because evolution does whatever it wants. And music does whatever it wants as well, which is why when I was working at Apple Music a few years ago, I made a schema of a completely new type of hierarchy to schematize classical music. Obviously when we talk about a hierarchy, we immediately imagine a series of vertical layers, one under the other,...
-
4
Teodora Petkova: Dialogic Communication for the Semantic Web – Episode 7
Teodora Petkova Teodora Petkova is a scholar and content marketer whose PhD dissertation explored semantic technologies and dialogical theory and how they apply in the field of digital marketing communication. She thoughtfully combines her rigorous academic thinking with pragmatic data- and knowledge-management practices in her content-marketing work. She's currently building a knowledge graph with marketing content at Ontotext, the RDF database and data-management platform company. We talked about: her work building a knowledge graph of marketing content at Ontotext her book "Being Dialogic" and the concept of dialogic communication the importance of metadata in dialogic communication architectures how creating a controlled vocabulary or an ontology can support a shared understanding of the concepts stakeholders and users work with her overview of the semantic web her current study with the 90-year-old marketing-education legend Philip Kotler the connections she makes between corporate knowledge management and corporate marketing the utility of enterprise-wide controlled vocabularies how your content efforts can help you curate your knowledge graph the urgent need to cultivate the social practices to help us generate and curate metadata Teodora's bio Teodora Petkova is a philologist fascinated by the metamorphoses of text on the Web and curious about the ways the Semantic Web unfolds. She holds a PhD. in Marketing Communication, an MS in Creative writing and a Bachelor of Science in Classics. Teodora is the author of the books The Brave New Text and Being Dialogic. Following her genuine commitment to creating dialogic moments through semantic annotations, from 2022, Teodora is part of the Ontotext Knowledge Graph team. The Ontotext Knowledge Graph is where Teodora strives to harness the potential of the Semantic Web to foster dialogic marketing communications. Driven by the fascination with the ever-evolving nature of text on the Web, Teodora is also teaching web writing to students at the Content Strategy Masters program in FH Joanneum. Connect with Teodora online TeodoraPetkova.com Teodora's newsletter LinkedIn Teodora's books The Brave New Text Being Dialogic Resources mentioned in this interview How to Do Things with Data, Klaus Bruhn Jensen The Semantic Web, Tim Berners-Lee, James Hendler, and Ora Lassila Marketing 4.0: Moving from Traditional to Digital, Philip Kotler Video Here’s the video version of our conversation: https://www.youtube.com/watch?v=vF5xKzXxoWM Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 7. When you combine deep academic curiosity, a powerful concept like the semantic web, and an intellectually innovative approach to understanding how humans, computers, and corporate knowledge interact, you get Teodora Petkova's concept of "dialogic communication." Combine this powerful idea with a nuanced appreciation for modern marketing and carefully curated enterprise metadata and you can powerfully convey your organization's value to the world. Interview transcript Larry: Hi everyone. Welcome to episode number seven of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Teodora Petkova. Teodora is just an ordinary content writer stumbled into the knowledge graph world. So welcome to the show, Teodora. Tell the folks a little bit more about what you are doing these days. Teodora: Hi, Larry. Thanks for having me. I'm doing what I have been dreaming about. I'm building a knowledge graph with Ontotext, out of marketing content, and I'm doing a lot of fights, a lot of things that I imagined and envisioned, but they need to be translated into specifications and requirements. So I now have a lot of time to write, do content writing per se; however, I write in the book of life. Larry: Yeah, because your writing is; it's not like my original career was in book publishing, and you just got a manuscript from somebody and turned it into a thing and put it out in the world, and that was that. A little different world these days. But the foundation of this new world you've articulated this one aspect of modern communication I think is really important, this notion of dialogic communication. Can you talk a little? I know this, and just for background for folks, this was Teodora's whole PhD dissertation, so that's a big subject, but I'm hoping to get just sort of an overview for folks who want to dip into the waters of the dialogic communication. Teodora: And that's my book; that's called Being Dialogic, and it's steps on the shoulders of being digital. So to transition to that idea of the world being new, we all think that being digital and digitalization and everything is the core of that new communication, and yet the web is the core. And Cyberia, the so-called Cyberia that cyberscape is the core. Where we communicate in a many-to-many scenario and there are different audiences, algorithmic ones included, but to get all of that academic and maybe to theoretical cloud, we are now in the position to say, "Why would I read you? Why would I buy from you? Why would my fridge automatically order potatoes from your website or from your API?" Teodora: And that communication is exciting. Where does dialogue come in? It has never been out of this. The thing is that we are now in the position to, I wouldn't say, design dialogue because that's not dialogic; you don't design a dialogue. However, you can design for and build systems for dialogic orientation, meaning you can listen, embed feedback into your systems and into your loop, and you are not. We as marketers are now not in the position to shout and to publish relentlessly. We're in the position to understand what exactly the user needs and what are their data needs. Teodora: Final sentence: why do I say data needs? Because I might need to compare products, my app, my personal agent, my personal knowledge graph. I might want to hook it to your enterprise knowledge graph and see what happens. But that's, of course, too visionary. It's not reality yet. However, that new way of, as we said in the preliminary conversation, of allowing the user to pull content, read content with all the metadata artifacts in it. Is the new reality, at least for me. Larry: Yeah, and I think that thing you just said about building a system that supports dialogic communication, you can't really design any one communicative situation, but you can design the setting in a way that, like, "Here's some content, here's some metadata that goes with it, here's the other things you need to have this dialogic communication, and whether you're a human being or a computer," am I hearing that right? Teodora: Yes, you are. And because I'm living that every day now, and I'm thinking how easily we slip into saying, "Here's some content, here's some metadata, here's the situation." However, if we get our hands dirty with that, think about that. Here's content. What content? What shall we write about? Here's metadata. Okay, what metadata? What are the controlled vocabularies we would use? What are our systems? How metadata terribly overlaps across different systems? How departments speak with different metadata, how they talk about one and the same thing with different tags and with different naming in the different systems, and how that juggles or whatever. I'm not sure what the word is, but the internal flow is hampered by such inconsistencies. We're terrible at metadata; let's face it. Maybe not we, but at least from my experience, I've seen a lot of misunderstanding at the level of metadata, which is misunderstanding at the level of dialogue. Do we have the systems to hear each other and to agree upon shared meanings? Larry: What you just said, that seems key to this whole thing. Because I think about, I'm deep into the technical structured content stuff, and I think about metadata as just this conceptual thing that helps you stitch back together disarticulated or intelligently unarticulated chunks of content. But when you're putting them back together, you have to have the metadata. And similarly, in a dialogic situation where you're trying to, like you just said, agree on meaning. Tell me how metadata can help people and/or machines arrive at a common understanding of what they're talking about and what they're trying to accomplish. Teodora: Semantically, the answer is in your question. For me, when you're creating a controlled vocabulary or when you're creating an ontology, you are going through that process of reaching shared understanding. I'm trying to ground this talk into tangible things. What tags do we use for certain content types? For example, in WordPress, how do we measure the performance of these tags? How are these tags and their values related to our HubSpot measuring system? Can we say what content brings what people? Can we design journeys by embedding business logic in our metadata? Larry: Interesting. And one of the things you just said is reaching that shared understanding. Then you immediately got me back into my comfort zone, which is talking to both the internal stakeholders and users, all the people involved with the system, and just coming to a basic agreement about what do you mean by the word "customer" or "product", or things like that. And that's sort of like, it's not always called a controlled vocabulary, but once you've come to that agreement, it can serve as such. And what you just said, you just kind of touched on a whole bunch of different things like marketing metrics and customer journey mapping. Teodora: Yes. I have a troubled life because I'm trying to... No, by the way, it's good that you're saying it because I'm talking like this as if it's a mesh, and it can be a mesh when we talk at that abstract level,...
-
3
Dean Allemang: Semantic Web for the Working Ontologist – Episode 6
Dean Allemang Dean Allemang literally wrote the book on the semantic web. "Semantic Web for the Working Ontologist" is now in its third edition. In the book, Dean and his co-authors, James Hendler and Fabien Gandon, show how to apply web standards to build a meaningful web of global, connected knowledge. More recently, Dean has conducted research with his colleagues at data.world that shows how using knowledge graphs can triple the accuracy of LLM-based question-answering systems. We talked about: his role as a principal solutions architect at data.world the meaning of the "semantic web" and its intent of sharing meaning across the web the long history of knowledge representation and how the connectedness of the semantic web adds to it the crucial difference between documents about things and the strings that describe them the contrast between the persistent nature of enterprise data and the ephemerality of the applications that use the data the power of the simple structure of RDF, its mathematical affordances, and the ease of distribution it permits the impact of newer AI tech on knowledge graph building and querying the research that he and Juan Sequeda have conducted that shows how using knowledge graphs can triple the accuracy of LLM-based question-answering systems his thoughts on the yet-to-be-resolved one-way or two-way ontology question the crucial role of trust in AI and how replacing LLMs with knowledge graphs as the point of contact in AI systems could build more trust Dean's bio Dean Allemang has been active in the field of Artificial Intelligence (AI) since the 1980s. With a notable emphasis on Semantic Web, he is the author of the book "Semantic Web for the Working Ontologist." His passion for understanding and implementing knowledge graphs led to a significant publication about using LLMs to answer queries over structured data, which introduced a new benchmark for evaluation. In his current role as a Principal Solutions Architect at data.world, he contributes extensively to the development of the AI Context Engine product, which is inspired by his recent research (with Juan Sequeda and Bryon Jacob), and underscores his commitment to practical application of theoretical principles. For a span of about a decode, Dean operated as an independent consultant, utilizing knowledge graph solutions to address challenges in industries such as Media, Finance, and Life Sciences. This diverse experience has cultivated a broad perspective on applying AI and Semantic Web principles. Influenced by Sir Tim Berners-Lee's concept of linked data and data sharing, Dean Allemang's work reflects a consistent focus on these principles. His contributions have advanced the field of AI and his current interest lies in how knowledge graphs can make generative AI more effective. Connect with Dean online LinkedIn Medium Resources mentioned in this interview Semantic Web for the Working Ontologist A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases, Juan Sequeda, Dean Allemang, Bryon Jacob The Semantic Web, Tim Berners-Lee, James Hendler, and Ora Lassila Video Here’s the video version of our conversation: https://youtu.be/29kmAc6tobU Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 6. Long before the introduction of the semantic web - the innovation that added meaning and metadata to documents on the web - AI pioneers like Dean Allemang had been thinking about how knowledge could be formalized to help people do their work. The web itself, along with the W3C standards that power its semantic capabilities, gave Dean and his peers the ability to scale and connect existing practices and technologies to build a more meaningful web. Interview transcript Larry: Things. Hi everyone. Welcome to episode number six of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Dean Allemang. Dean is a principal solutions architect at data.world, a company in this space. He's also the author of The Semantic Web for the Working Ontologist, kind of the original operating manual and textbook for this field. So welcome, Dean. Tell the folks a little bit more about what you're up to these days. Dean: Hi, Larry. It's great to be here. So I've been at data.world for about three years now. I was an independent consultant for a bit before that. What I have been up to lately, not surprisingly, is figuring out how all of the new AI fits in with the sort of knowledge-intensive stuff that we've been doing with ontologies and knowledge graphs and things like that. Larry: Cool. Yeah, and I'd love to talk more about that research, time permitting, but I'll certainly link to it. I know you have a lot of cool new stuff coming up too, so I'll be sure to keep it. I'll try to keep the web page updated too with that. Larry: But hey, I want to back up just a little bit, like the dawn of this technology and this whole ecosystem we're operating in the semantic web. That's the first two words in your title of your book. Tell us folks a little bit more about the semantic web, its origins, what it is. Dean: Yeah, so one of the things I like to say about the semantic web is that the emphasis is on the final syllable; it's about web. That's the important insight that the semantic web brings to. Well, at the time, knowledge representation was the big thing, and the real key to the semantic web is in fact the web nature of it. So what you're doing in the semantic web, and this is a vision that came from Tim Berners-Lee in the mid, that sort of got more popular in the late 90s when the standards started to come through. But the idea is that we have this notion of the web; cast your mind back to the nineties when the web was a new thing, and instead of just having pages linking off to each other, could you actually have bits of knowledge that refer to each other in a deep, meaningful, dare I say it, semantic way. Dean: So Tim Berners-Lee came up with the name semantic web, and the point of it was that we want to be sharing not just documents on the web but meaning on the web. And that was the whole idea from Tim Berners-Lee, basically in the 90s. In some sense, he often says that this is the web he always had in mind, but the document web that we know and love was the first, the crawl of the crawl walk, run of the semantic web. That's how Tim often refers to it. But for us, this is really looking at knowledge representation, which has been around for a long time, and bringing it forward to the new information age, which is what we now call the web. Larry: Yeah, and one thing I want to point out is that one of the co-authors of your book is James Hendler, who famously co-wrote the paper with Tim Berners-Lee about introducing the idea that of... Dean: That's right. The Scientific American article back in 2001, I think it was? Larry: I think it was March or May; it was one of those M months in 2001. Yes. Dean: Yeah, early in '01. And that really sort of put the name on the idea of the semantic web. And so that was sort of how it all began, and the point of my book, which, as Jim and I wrote the first edition, I guess about seven years later, we were doing a course, a little corporate training with our partners, TopQuadrant, at the time, about semantic web. And we found that after four days of intense lectures and exercises and things, that we still found that a lot of very smart people were pretty tentative about answering basic questions about the semantic web. Well, we really need something to give them at the end of this course that they can take home and read. And so, one day over a beer, after doing this course, Jim and I conceived the idea of semantic web for the Working Ontologist and started to work on it. And of course, as I know you're aware, Larry, projects like this always take much longer than you expect. And I think it was actually three years later that we finally had the manuscript ready for the publisher. Larry: I don't know what you're talking about. When I was a book editor, everything that always came in early and no... I know how it goes, but hey, I want to talk... I love the origin story because it comes right out of what you want this book to be doing. It's like, "Okay, these people who are learning..." Because there's new technology, well, there's kind of two things. I guess you mentioned that knowledge representation has been around as a discipline for a while and then, but I'm going to guess it wasn't as technical as it is now or the specific technical implementation of it changed with the advent of the semantic web. Dean: Yes, that's certainly the case. Well, back in the really old days, things like what KL-ONE was a lisp-based knowledge, representation, language, and loom and all these things. They were actually very technical, indeed. What they weren't was distributed. That's why I say the emphasis on the final word web, and this is the thing that actually, if there's one thing about the semantic web that I find doesn't get through is that what we're doing here is sharing knowledge. So if you think about the EDM Council, one of my former clients, they published an ontology called FIBO, the Financial Industry Business Ontology. What are they doing there? They're writing down a data model. Well, anybody could write a data model, a lots of people do, and they are publishing it, and people do that as well. The OMG does a lot of that stuff. But why did the EDM Council decide to use the semantic web? Dean: They want people to be able to refer to parts of that independent of some document. They want to have a machine-readable way of bringing this into their system so that every last part of this great, big behemoth can be referenced on its own....
-
2
Alan Morrison: Pragmatic Knowledge Graph Insights from an Industry Analyst – Episode 5
Alan Morrison After 20-plus years of industry analysis, Alan Morrison has developed a keen sense for how knowledge graphs can help enterprises. Even though he has focused on advanced tech and emerging IT practices and is deeply immersed and invested in current tech developments, much of his advice for enterprises looking to develop their data maturity involves pragmatic baby steps and basic mindset shifts. We talked about: his work in the consulting world and his organizing work around the knowledge graph community to improve awareness of the technology the need to find "foxes instead of the hedgehogs" in enterprises when you're trying to promote adoption of new tech the relationships between different AI tech, like LLMs and knowledge graphs, and the common connection they share: data the importance of having mature data practices in any enterprise how even simple metadata practices in common tools like spreadsheets can support better enterprise data practices how sidestepping the formal org chart and forming guerrilla teams can advance data practice the benefits of starting small in any knowledge graph project how representing organization knowledge at a high level in a knowledge graph can help solve big enterprise problems how a knowledge graph gives you a multidimensional Tinker Toys set to model and understand your org's data the benefits of moving from tabular thinking to graph thinking his frustration with the current framing of AI as being solely about machine learning his observation that practices across any org - content, knowledge management, data management, business people - could benefit from long-standing standards and proven technologies (that might not be as sexy and topical as LLMs) Alan's bio Alan Morrison is a longtime analyst, writer, advisor and podcaster on advanced data technologies and emerging IT. For 20 years at PwC's R&D and innovation think tanks, Alan identified emerging technologies on the cusp of adoption, assessed their business impacts, and advised PwC's clients on innovation strategy. Before PwC, he was a semiconductor industry market analyst and forecaster, a retail site location analyst, and a US Navy intelligence analyst, Russian linguist and aircrewman. For the last five years, Alan has been a contributor on knowledge graph and related topics for Data Science Central. His writings over the years have covered dozens of different technologies. Connect with Alan online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/TXOWWjM-DBc Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 5. You might think that the lofty perch of multiple decades in industry-analyst roles would inspire grand visions of tech transformation with leading-edge technology. Quite the opposite in the case Alan Morrison. He shows how enterprises can advance their data maturity by cultivating basic graph thinking in their organizations and by taking small, pragmatic steps like adopting established standards for interoperability or simply adding metadata to a spreadsheet. Interview transcript Larry: Hi, everyone. Welcome to episode number five of the Knowledge Graph Insights podcast. I am really happy today to welcome to the show Alan Morrison. Currently, he's a contributor at Data Science Central, a well-known publication in the field. He's a freelancer and consultant around knowledge graphs and a lot of other areas as well. His background, he comes out of the consulting world. Most recently before his current role as a freelancer and consultant, he worked for many years at PriceWaterhouseCooper, the big consultancy as a senior research fellow. So welcome, Alan. Tell the folks a little bit more about what you're up to these days. Alan: Hey, Larry. Great to talk with you and folks should know that you and I have some history together as a part of the Data Worthy Collective, which is like an informal meetup group, collaborative thinking going on every week, and it's been great to know you over the years. What I'm doing currently is trying to help the knowledge graph community, in particular, gain more visibility, gain more traction in the enterprise, and it's a long haul. Enterprises are slow to change. It's like turning an oil tanker on a dime. It's very, very hard kind of thing to do, and there's so much legacy involved. And so when I was at PWC, we worked with a lot of large companies and you'd look for pockets of innovation and you'd look for the foxes instead of the hedgehogs, because the foxes were the ones that were curious about doing things different ways. The hedgehogs were the ones who were expert in doing things in one way. Alan: So we had this kind of guerrilla approach to innovation, and we also worked with the centralized innovation group inside the firm to try to help the firm itself modernize. And so in my last five years at the firm, I was plugged into the AI efforts that were emerging because machine learning was becoming much more feasible. And so I've taken this knowledge that I have of AI and the semantic web so-called, an old term, but it still has utility, together, and I'm just trying to help enterprises see the advantages of these things and adopt them to the extent they can be. Larry: You just said how hard, notoriously difficult it is to get enterprises to think and act differently, but they're all jumping all over AI like it's the best thing since sliced bread. But there's a lot of opportunities there, it seems like, to not ride the coattails, but to enter the conversation around these new technologies. And there's a lot of interplay between generative AI and machine learning and LLMs and all that world and the knowledge graph world. Can you kind of stitch those worlds together for us a little bit? Alan: Yeah. I think it's good to do that. It's good to step back and say, "What are we trying to do here? How are we trying to do it?" We've got some piece parts that are talked about in the media at infinitum. Generative AI is just all the time in the conversation because it's a powerful interface technology, as it stands, and some enterprise providers are using GAI as basically a front end, and then they will connect their own backend. And so I think you have to think about generative AI and other kinds of AI in the context of this bigger picture, and it's all driven by data. And when we say data, we don't just mean binary bits, I think we mean ideally contextualized information, knowledge, getting wisdom and decision-making capability to the right point where it's actionable at the right time for the right purpose. Alan: And so it's a distribution problem of knowledge, essentially the right kinds of knowledge. And so you really have to think about data, and this is where I get passionate about it, because the information, the heart of it is in this contextualized environment that should be being built, and it should be an organic kind of effort that involves both humans and machines. And I'm a woodworker, so I think about machine learning as a kind of a table saw. And so the knowledge is the wood that you're working with, it's an organic thing. And so you're using all this different kinds of tooling. I got a lot of different kinds of tooling in my workshop, and I'm not a great woodworker, but I think that the source of the wood is really important, what kind of wood you're working with. And we could do all sorts of things if we had the right resources inside of enterprises. Alan: Enterprises are really starved for good data. They just are not in the habit of collecting it very well. I was in intelligence in the Navy when I started, just collecting voice traffic and analyzing it. And it was so systematic about how the data collection happened. There was this whole data lifecycle environment that we were a part of, and everybody was managing according to the needs of that. And I just think that that was a very effective way that enterprises could take advantage of to really collect what they need to and just understand if you're going to digitize things, you have to have this continual process of collecting and analyzing and managing this information. And it has to be organically constructed so that it's scalable and it does what you need it to do. So that's basically where I've been focused over the past years. Larry: Yeah. The way you described that is so evocative of, I love the word working analogy, but also your military experience. I think any enterprise would argue that they would claim to attribute the same importance or similar level of importance as naval intelligence data about whatever you were researching. And yet, they have these sloppy data practices, or if not sloppy, at least not thought-through and sort of suboptimal. Can you talk about two things? One, how could they be better at that data hygiene and that data practice that you just described that was so well entrenched in your Navy days? And then in particular, how knowledge graphs and the whole world of semantic tech can help you do more with that data, and is it a prerequisite to have that good data hygiene before you can do the cool stuff with knowledge graphs? Alan: Let me start with the last question first. It is a prerequisite to have a certain amount of data maturity. When I was at PWC, we had a data maturity curve, and it was frustrating to see that most of our audit clients were not terribly high on that maturity curve. I think the tooling gets in the way. There's so much siloing that has gone on over the decades, and so many folks, including me, are in the habit of just using certain tools. And so the learning curve for learning something new is a bit steep. And so what's happened is that we've just proliferated these data silos that have limited utility and a short lifetime when you could be just contributing to a much larger ecosys
-
1
Ellie Young: Grounding Knowledge Graphs in the Humanities – Episode 4
Ellie Young Ellie Young effortlessly connects the human and technical elements that go into ontologies and knowledge graph building. Ellie came to the world of knowledge graphs with backgrounds in both literature and sustainability. "If the world wasn't on fire," she says, " I would probably be writing novels." That sense of urgency drives her work at Common Action, a platform she is creating to address climate change and advance sustainability. She also applies her knowledge graph expertise in projects like HelioWeb at NASA, which connects scientists in the field of heliophysics. We talked about: her work at Common Action, a platform for climate and for sustainability that uses knowledge graph technology her work at NASA to facilitate collaboration and expose knowledge across the domain of heliophysics (the study of the sun) how personal knowledge graphs can connect individuals and collectives of people how her background in design, art,literature, and the humanities manifests in her knowledge graph work her desire to leverage metadata and capabilities like her language knowledge to facilitate topical discovery the interplay between the efficiencies that AI tech like LLMs offer and the uniquely imaginative variations that human beings create the importance of the practice of design in advancing the productive use of information how user experience design connects to ontologies and back-end tech how she applies, and imagines how others might apply, a literary mindset to ontology practice how she applied ethnographic methods from anthropology to a paper she co-authored on the NASA HelioWeb ontology her ongoing call for volunteers to help with her Common Action program, specifically a current need for creating a "phenomena ontology" Ellie's bio Ellie Young brings knowledge to communities to catalyze successful, local actions to address climate/sustainability problems. She is the founder of Common Action, an innovation network facilitating climate and sustainability action through the development of community and knowledge graph technology. Previously she served as Head of Community and Director of Conference Operations at The Knowledge Graph Conference. Connect with Ellie online LinkedIn ellie at common-action dot org Resources The cultural-social nucleus of an open community: A multi-level community knowledge graph and NASA application, Applied Computing and Geosciences Common Action vision Video Here’s the video version of our conversation: https://youtu.be/CL9HWocoh7I Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 4. In the domains of ontology engineering and the semantic web, there are plenty of people with advanced technical skills. Practitioners with social-science skills, well-developed literary instincts, and a design mindset are harder to find. Ellie Young smoothly navigates the technical and linguistic worlds that intersect in knowledge graphs, applying her humanities mindset to projects that connect scientists at NASA and address climate change and sustainability. Interview transcript Larry: Okay. Hi everyone. Welcome to episode number four of the Knowledge Graph Insights podcast. I am super delighted today to welcome to the show Ellie Young. Ellie is the founder of Common Action, a sustainability and climate change activist organization. She's also ... The way I first met her, she's the former head of community for the Knowledge Graph Conference, and one of the people who really ushered a lot of people into this community. So welcome Ellie. Tell the folks a little bit more about what you're up to these days. Ellie: Thanks, Larry, and thank you for inviting me to be a special guest with you on this very nice community visit. So yeah. Now I've gone from KGC. I think I left about three years ago to starting Common Action, which is not really an activist organization. It's more like of a design and deep operations platform. And so what we're trying to do is bring technology to the climate problem. Because if we think about it, there's many, many things that make climate challenging to solve. But one of the core components is that it's a coordination challenge. It's a communication challenge. It's a really big project. And so if we think about really big projects, those tend to happen on Gmail or on Slack or maybe on Teams if you're really unlucky and it's a lot of interaction between moving parts, many people, and there's a lot of confusion in that. And it's always dependent on those people knowing each other, especially when you think about project management software like Asana or something like that. Ellie: So in the climate case where we literally cannot know everybody who needs to do something about climate because it's the globe, how would we think about organizing those communications and supporting specific people to identify routes to action pathways that help them become part of this mission set in a productive way? So anybody who wants to do something about climate, where do they go? What do they do? We are trying to be that middle space, both for just equipping people with strategies and also for looking at the unfolding changes globally in the physical layer and anticipating what kind of effects we might have and then supporting people to respond and prepare to those as well. So there's a whole lot of things that we want to facilitate, but at the end of the day, it's a technology-based firm that is bringing to the market complexity-based software, and that is always backed by knowledge graph technology. Larry: Because how else would you ever organize that scope that you just described? That's a really ambitious domain you described. But your current work, you're working in a slightly more constrained domain. At NASA, you're working on the HelioWeb project. That's at least ... What? Science is the boundary of that one? Ellie: Yeah. Well, even better, it's one division of science. So we're just cutting our teeth on a small project for NASA. And what this is called ... And as you said, it's called HelioWeb. What this is about is basically solving some of the same communication and discovery problems for scientists who work both within NASA and beyond NASA within this investigation space of the heliophysics. And so you probably haven't heard of heliophysics. It's not as well socialized as astrophysics, but it is actually another form of astrophysics and it's all about the study of the sun. So there's a lot of people that participate in studying the sun, not only inside NASA, as I mentioned, but also in universities and also abroad in the European Space Agency and other related space agencies. Ellie: As I mentioned, what are our collaboration opportunities? Well, we can work on Slack, we can go to Teams and we can meet in a conference. And so what we're trying to do is expose the knowledge that is latent in a conference space. And we've talked about this too at KGC. So that we can see who is there beyond a simple interaction in the hallway. Because let's say you go to a conference, there's a thousand people there, you can't talk to every one of them. And even if you did, you wouldn't necessarily know everything that would be relevant for them. For them to tell you or for you to share with them. We just don't have the surface area for that in our interactions in person. So what we're doing is supplementing that with a software system that allows individuals who are part of the HelioWeb and heliophysics community to report items of interest about themselves. That could be activities that they're doing, roles that they have, projects that they're organizing, softwares that they built or are maintaining. Any of the contributions that they make to science they can record it in a catalog, and then they can search using an ontology for different kinds of parameters, which are not connected necessarily to an individual name. Ellie: So I may not know who I would like to work with in this group of thousands of scientists because I haven't met them yet, or because I don't know how someone's research interests have transformed or morphed over time. But I know that I'm interested in solar flares or coronal mass eruptions. And so I can search for those topics and see what other people have contributed to the knowledge base on those topics like, like I said, software systems or data sets or papers or whatever you have that people have entered. And then so in that way, discover the actual network. So it's this user interface navigation notebook for supporting the discourse in the field alongside making it accessible to find materials, which is also a challenge in the NASA environment. As you can imagine, they have millions, trillions of data objects. Larry: No kidding. As you're saying that, you're reminding me, have you ever talked to Ashley Faith about personal knowledge graphs? Ellie: A little bit. Larry: Yeah. Because as you're saying that, one of the things ... Because I was talking to her on another podcast years ago about this notion of personal knowledge graphs as a way to just know and understand people, but we were talking about them in the context of enterprise content strategy as a way to know people who know stuff and to get permission to share stuff. But it was all about sharing the kinds of things you just mentioned, like personal interests, things you've done, activities you've done. So the way you described that, you can totally see how you just associate in the ontology, all those interests and activities with the people. Is it as simple as that, or is there more to it? Were there any challenges I guess, in developing the ontology that drives that? Ellie: Yeah. There's multiple stages of HelioWeb, and in the current moment we're in the first stage, which is this catalog. And you're right that it mirrors the idea or the many i
-
0
George Anadiotis: Connecting the Dots in the Knowledge Graph World – Episode 3
George Anadiotis Every profession has its connectors, sharers, and community organizers. In the knowledge graph world, George Anadiotis fills all of these roles. Through his industry analysis and reporting, his conference organizing, and his writing and podcasting, George connects ideas and people across the semantic-tech landscape. We talked about: his work at Linked Data Orchestration and as a consultant and analyst in the knowledge graph and linked-data world his diverse background in computing and his studies at the intersection of knowledge management, the semantic web, and distributed systems his extensive writing experience and consulting background his definition of a knowledge graph the differences between RDF-based knowledge graphs and labeled property graphs (LPG) the focus in the RDF community on standards and interoperability versus the focus in the LPG community on implementation the variety of query languages in the LPG world and recent efforts like GQL to create a standard way of querying LPGs, as well as efforts to query across both RDF and LPG graphs the origins of his annual Year of the Graph report some of the reasons that knowledge graphs are positioned in the bullseye of Gartner's Impact Radar this year where knowledge graphs fit in the AI landscape the role of knowledge graphs in RAG architectures the conference he organizes, Connected Data London, coming up December 11-13 George's bio George Anadiotis has got tech, data, AI and media, and he's not afraid to use them. He helps organizations map and understand complex domains to make better decisions; design, implement and monitor models, processes and systems to achieve goals; and craft communication strategies and outreach initiatives to grow awareness and market share. He enjoys researching, developing, applying, writing and talking about cutting edge concepts and technology, and their implications on society and business. Connect with George online LinkedIn Twitter TikTok Instagram George's publications, podcasts, and conference Connected Data London (conference roundtable recording) The Year of the Graph Orchestrate All the Things Video Here’s the video version of our conversation: https://youtu.be/lEHj5_9y-30 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 3. In any domain, there are people who seem to do it all - practice and consultation, industry analysis and reporting, and community building and event organizing. In the world of knowledge graphs and the semantic web, George Anadiotis has filled all of these roles. Whether he's publishing his Year of the Graph newsletter, organizing the annual Connected Data conference, or producing the latest Orchestrate All the Things podcast, George is always connecting the dots. Interview transcript Larry: Hi, everyone. Welcome to episode number three of the Knowledge Graph Insights Contest. Sorry, I'm going to redo that again. I have too many podcasts. I need a new intro for this one. Okay. Hi, everyone. Welcome to episode number three of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show George Anadiotis. George is really well known in the knowledge graph world and the graph world in general and the tech world in general, as an analyst, a consultant, a really well-developed engineer. He runs a big conference around knowledge graph and graph technology, and he is the principal at his organization called Linked Data Orchestration. So welcome, George, tell the folks a little bit more about what you're up to these days. George: Great. Thanks for the intro, Larry, and good to be here. Actually one of the opening, I guess, guests for this new podcast series of yours. Well, the truth is I have a long and kind of convoluted story, but I've kind of honed my skills of telling it in as simple way as possible. So basically in terms of background, I have a very hardcore computer science background. I was one of those kids that I saw my first computer when I was like 12, and immediately I kind of snapped and I realized, "Okay, so this is what I'm going to do in life." So went to college, studied computer science, graduated, started working as a software engineer and architect and all of that stuff, consultant, all of that. And then at some point, about a decade in basically, I realized that it's been fun, but I wanted to try something new. George: And that's the point where graphs sort of entered my life because the thing... I was interested in research and my topic was somewhere around the intersection of knowledge management, semantic web and distributed systems, and there was a specific group that I wanted to join that was working precisely on the intersection of those things, the Knowledge Representation and Reasoning group based in Amsterdam, led by one of my mentors, Frank Van Harmerlen. So I was lucky enough to spend a few years there, did some really cool stuff in that group up until the point where I left, I repatriated. So I should also mention I'm from Greece originally, so I spent a few years in Amsterdam, then moved back to Greece, kept working on the intersection of those technologies actually. But for a few years I did that leading the R&D of a company that was developing projects and products around those up until 2012. George: And that's the point where I started doing my solopreneur thing. So ever since I've been juggling a few things. So I work as an analyst, I collaborate with GigaOm, I work as a writer. I've contributed to a few publications such as VentureBeat and ZDNET. I have my own newsletter and blog called the Orchestrate all the Things podcast and newsletter. What else? Let's see. As you mentioned, I organize an event, it's called Connected Data. I think we can elaborate a little bit on that later because it's actually very much relevant for the knowledge graph theme. I also curate a newsletter called The Year of the Graph and their graph database report also going by the same name to consulting with a number of companies from... Pretty much everything ranging from go-to-market strategy, marketing to technical implementation. So I juggle many balls, as I said. Larry: I'm exhausted just listening to that and I feel very grateful that you found the time to talk to me with all that you have going on. So thank you, George. Hey, one thing I like to start each episode with is I would love to get your definition of a knowledge graph just for folks... The idea is to hopefully come up with something like a canonical definition somewhere in the next one to five years. But anyhow, I'd love to get your take on what a knowledge graph is. George: Yeah, that's a good one. And somehow it never gets old. I'm sure you're probably familiar with the fact that I believe a couple of years when I last checked, I think there were over 100 definitions for what constitutes a knowledge graph. So they've probably grown to, I don't know, maybe 200 by now. So 200, 201 who's counting. I'm going to give you mine as well. And by the way, if you ask me next year, I'm probably going to tell you something slightly different, but here's the current definition. So if we're talking about the graph, then it basically means that we're talking about the data model in which the key elements are nodes and edges. George: I'm going to add the directed adjective to edges because well, if you only have edges without direction, it may lead to some ambiguity, let's say. So that's the graph part, which it's not very original, it's kind of the textbook definition. What may be a bit more original is the knowledge part. So I think in order to be able to qualify a graph as being a knowledge graph, I think there are certain conditions that need to be met. So basically I think that both nodes and edges should enable users to define their properties and they should adhere to a schema. That's as lightweight as I could possibly keep it without getting too technical. Larry: Interesting. And that notion of a schema and both having properties on the edges. I guess maybe want to diverge just a little bit and talk about the difference between an RDF-based, triple-based knowledge graph and a labeled property graph. Can you talk a little bit... That might be another thing to really get... Because I think a lot of people, when they hear graph technology, they're thinking most... I think the most common databases and tools that are out there are often around labeled property graphs. So can you help us tease out between an RDF-based knowledge graph and a labeled property graph? George: Okay, well, we could spend at least four or five podcast episodes talking just about that. And by the way, there recently was another episode by a good friend actually, and also very knowledgeable person in the graph world, Amy Hodler. So she spent an entire episode with her guests dissecting this exact topic. So what's an RDF-knowledge graph? What's a labeled property graph? How are they different? When should I use what and so on. So I'm not going to even try and be as extensive as they were, but let me just say that for most people, if they're not familiar with graphs at all, maybe just the general idea of the graph data model, let's say, they don't even know... They can't actually imagine, I'm guessing, that in the graph world you do have this kind of schism. George: So there's two ways of modeling graphs, because if you think about it, that's not the case for relational data. As far as I can remember, that's not the case for the document data model either. So in those words, things are pretty straightforward. Okay, so I want to build a relational database. I have tables, I have SQL. There's one way to go, basically. Yes, you may have slight variations on the query languages, but for the most part it's all pretty standard and pretty straightforward....
-
-1
Mike Dillinger: Knowledge Graphs as “Jet Fuel” for Generative AI – Episode 2
Mike Dillinger Knowledge graphs provide the digital foundation for some of most visible companies on the web. Mike Dillinger built LinkedIn's Economic Graph, the knowledge graph that powers the social media giant's recommendation systems. Mike now helps people understand knowledge graph technology and how it can complement and improve generative AI, whether by acting as "jet fuel" to better train LLMs or by providing "adult supervision" for their unruly, adolescent behavior. We talked about: how he describes knowledge graphs how the richness of information in a knowledge graph helps computers better understand the things in a system the differences between knowledge graphs and LLMs how LinkedIn's Economic Graph, which Mike's team built, works how LLMs can help build knowledge graphs, and how knowledge graphs can act as "jet fuel" to train LLMs the RDF "triples" that are at the foundation of knowledge graphs the importance of distinguishing between unique concepts in a knowledge graph and how practitioners do this the two main crafts needed to build knowledge graphs: linguistic expertise and software engineering the job opportunities for language professionals in the LLM and knowledge graph worlds the propensity of tech companies to staff knowledge graph efforts with engineers while there is actually a need for a variety of talent, as well as better collaboration skills his assertion that "language professionals aren't janitors," put on teams only to clean up data for software engineers how knowledge graphs provide "adult supervision" for unruly, adolescent LLMs his hypothesis that using KGs as a separate modality of data rather than as training data for LLMs will advance AI Mike's bio Mike Dillinger, PhD is a technical advisor, consultant, and thought leader who champions the importance of capturing and leveraging reusable, explicit human knowledge to enable more reliable machine intelligence. He was Technical Lead for Knowledge Graphs in the AI Division at LinkedIn and for LinkedIn’s and eBay’s first machine translation systems. He was also an independent consultant specialized in deploying translation technologies for Fortune 500 companies, and Director of Linguistics at two machine translation software companies where he led development of the first commercial MT-TM integration. He was President of the Association for Machine Translation in the Americas and has two MT-related patents. Dr. Dillinger has also taught at more than a dozen universities in several countries, has been a visiting researcher on four continents, and has a weekly blog on Knowledge Architecture. Connect with Mike online LinkedIn Video Here’s the video version of our conversation: https://youtu.be/wX2C3DwiWG4 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 2. If you've ever looked for a job or recruited talent on LinkedIn, you've seen Mike Dillinger's work. His team built LinkedIn's Economic Graph, the knowledge graph that powers the social media platform's recommendation system. These days, Mike thinks a lot about how knowledge graph technology can work with generative AI, seeing opportunities for the technologies to help the other, like the ability of knowledge graphs to act as "jet fuel" to train large language models. Interview transcript Larry: Hi everyone. Welcome to Episode Number 2 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the program Mike Dillinger. Mike is a cage-free consultant based in San Jose. He's been doing knowledge graph and other technical things for many years. So welcome, Mike. Tell the folks a little bit more about what you're up to these days. Mike: Thanks a lot, Larry. What am I up to? I'm trying to help people wrap their heads around knowledge graphs and how we can make AI, transform GenAI into Next-GenAI by leveraging more explicit content and human knowledge. Larry: Ooh, I love that. Next-GenAI. I hope you've copyrighted that. Mike: No. Larry: Yeah, no, but that's ... because right now, it seems like most of the oxygen in the room has been sucked up by Open AI and LLMs and chatbots and GPTs, but knowledge graphs have something to offer as well. I guess the first thing I'd like to ask is, can you describe for folks, because I think a lot of folks listening to this podcast might not be as familiar as some of us with what a knowledge graph is and what it does. Can you sort of set out for folks what a knowledge graph is? Mike: Sure. The usual way I describe knowledge graphs are as collections of densely interconnected facts about individual things and categories of things, based on a range of different relations. So one thing that people get caught on is, oh, so there is a taxonomy? Not really, but taxonomies are a part of the knowledge graph. Oh, so it's an ontology? No, but ontologies are a part of the knowledge graph. So there are a range of different kinds of facts in a knowledge graph, so it's broader than an taxonomy or an ontology. Larry: And I think a lot of people come ... like, I work in the content world mostly, and information architecture. Most people in that world, I think the first time they go to organize stuff, they start thinking taxonomically, which I guess makes sense. But tell me the benefits of going beyond a simple taxonomy or just an ontology. How does it come together? How does it help people do more interesting and better stuff? Mike: Well, the question that we're talking about here is, you might call it the richness or the depth of the knowledge representation. With a taxonomy, you only have relations like this is a subcategory of that, or this is an instance of that, and you don't have information about what is this for or what are its attributes or what are its components? So when you move from a taxonomy to a knowledge graph, we're talking about giving algorithms more information in more detail about the things that we want them to think about. Larry: Interesting. And I think a lot of people right now, a lot of the curiosity and interest around these kinds of technologies is around LLMs and OpenAI and ChatGPT and all those things. Can you contrast a knowledge graph and what it can do with what those kinds of systems are doing? Mike: Oh, sure. So language models focus very much on strings, and sequences of strings, and knowledge graphs focus more on facts or concepts. So concepts built into facts, as it were. So they're really focusing on very different things: sequences of words or strings, or graphs of concepts. So the notion of meaning is really different. They're both in terms of similarity, but an LLM computes similar meaning in terms of context. Mike: So if two words have similar words around them, then those two words are considered in an LLM to have similar meanings. But in a knowledge graph, you compare to concepts by saying, oh, do they have similar components and similar characteristics? If so, then they're related in meaning. Mike: So they're very different ways of getting at a similar problem. Larry: Interesting. Mike: You might phrase it, for linguistic people, you might say LLMs focus on syntax and knowledge graphs focus on semantics, or LLMs focus on data and knowledge graphs focus on knowledge. There are a lot of different ways of describing it. So they're very, very much complementary technologies for getting at some of the same problems. Larry: You know what I'd love to do now is I'd love to ground what you just said in some examples. Like, what are your ... the first accomplishment of yours that I learned about was your work on the Economic Graph at LinkedIn. Can you talk a little bit about what the aim of that is and how a knowledge graph helped LinkedIn do better stuff with their data? Mike: Oh, sure. Okay. So the Economic Graph at LinkedIn is a model of the entities in the economy, focusing on schools that produce talent, people with talent, and companies that absorb that talent, okay, and then companies produce products. So we have things like companies, products, workers, schools, these are main entities, and there are a wide range of relationships between them. So when we want to find a worker who fits in a company in a particular position, we need to have a detailed and reliable description of both the position and the worker. Mike: This is what LinkedIn's technology is all about, is matching workers to openings, or now increasingly doing other things, like matching, in their ads, business matching products to people, kind of thing. So knowledge graphs are all about making matching work more systematically and in a more understandable way. Mike: So this is what we did at LinkedIn. We built up a kind of vocabulary for describing people or workers and for describing jobs, but we used the same vocabulary for both, so that we could translate, as it were, your worker profile and this company's job profile into a same meta-language. And it made it much easier and much more accurate to compare one with the other. And that meta-language is what we call a knowledge graph. Larry: And some of the mechanisms that permit that ... I work mostly in the content world, and we are famous for being bad at our core competency, which is naming and labeling things. And so there's a lot of people doing content jobs that they're doing the same thing, but they have a different job title. I know there's techniques in the knowledge graph world for resolving that kind of discrepancy. Can you talk a little bit about ... and I assume that must have happened at scale in the economic graph. Mike: Yes. Oh yeah. Yeah. So when I built a team there, we faced a little problem of having 150 million distinct job titles to navigate. So this is way bigger than anything that normal taxonomists usually deal with. So we had to cut that problem down to size, and then,...
-
-2
François Scharffe and Thomas Deeley: The Knowledge Graph Conference – Episode 1
François Scharffe and Thomas Deely The Knowledge Graph Conference is one of the premier events in the semantic technology space. François Scharffe and Thomas Deeley started the conference to bridge the gap between academic researchers and industry practitioners. The community they have built around the conference and the conference programming - a mix of workshops, classes, presentations, and demos - reflect this purpose. They also intend to democratize knowledge graph use, and toward that end are participating in a number of efforts to develop education programs - both professional certifications and academic curricula. We talked about: the origins of the Knowledge Graph Conference, and its balanced inclusion of both academic and industry entities how generative AI is propelling interest in knowledge graphs their mission to broaden awareness of knowledge graph technology and practice the importance of education and the need for a practical approach the origins of the Open Knowledge Network, part of the National Science Foundation's Proto-Open Knowledge Network program, and KGC's role in it their ambition to build an educational institute around KG technology how to help enterprise executives understand the benefits of knowledge graphs how modeling an enterprise's knowledge and capturing it in a knowledge graph can help organizations address complex challenges the advantages of knowledge graphs over LLMs and GenAI, which have yet to prove their reliability how LLMs can assist in the construction and use of knowledge graphs how study of the human brain illustrates how GenAI and KGs can work together the community that has arisen around the Knowledge Graph Conference François' bio François Scharffe is a hands-on technology executive with a track record of improving decision making in complex data environments. His career as a technical leader and entrepreneur has led him to perform engineering, product management and leadership roles in various organizations. François has also worked as a lecturer and researcher, most recently at Columbia University (New York) and at the University of Montpellier (France). François is the founder and chief executive of The Data Chefs, a data management consulting firm, and the founder of the Knowledge Graph Conference, the leading event on knowledge-centric AI technologies. Thomas' bio Thomas Deely is co-founder of The Knowledge Graph Conference and Community, a global community bridging research and industry on Knowledge Graphs, AI, and related technologies. Thomas is also the Customer Community Manager at Box, the leading cloud content management platform. Thomas started his career as an engineer at JPMorgan in London, before joining Goldman Sachs where he advanced to become a senior engineer in the NY office. Thomas launched an Applied Analytics program and executive education initiatives at Columbia University, before venturing into the customer experience, product, and community domain at companies such as Unqork, where he launched the Community, and Stack Overflow, where he helped grow and develop the StackOverflow for Teams product and business, as part of the customer success team, before joining Box. Thomas has an electronic engineering undergraduate degree from University College, Dublin, and a Masters in Science in Technology Management from Columbia University, and lives in NY where he is married with two children. Connect with François and Thomas online François at LinkedIn Thomas at LinkedIn Video Here’s the video version of our conversation: https://youtu.be/gjtoPXY3Ka8 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 1. I'm really happy to launch this new podcast with a conversation with François Scharffe and Thomas Deeley, the founders of The Knowledge Graph Conference. Their annual gathering in New York City attracts knowledge graph practitioners, researchers, and vendors from around the world for a full week of workshops, presentations, and tech demos. To further advance their democratization of knowledge graphs, they're also launching new educational offerings. Interview transcript Larry: Hi, everyone. Welcome to episode number one of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show François Scharffe and Thomas Deely. They are co-founders of The Knowledge Graph Conference. François is an independent consultant and an entrepreneur. He's also got an academic position he's on leave from right now. Thomas manages... A community manager for Box, a content sharing cloud application. So welcome, folks. Larry: François, do you want to tell the folks a little bit more about what you're up to these days? François: Sure. Hey, Larry. Thank you for the invitation. Very glad to be here. Such an honor. First episode of what will be another eight years long-running podcast hopefully. So very cool. Thanks. François: Yeah, I'm up to a number of projects. I'm kind of emerging off parenting season with two small kids. And I'm looking at different opportunities always around AI and knowledge engineering, neuro-symbolic AI. And in that context, I'm looking at different topics and domains. François: I won't go into the details into the projects, but I look at things like agriculture, climate. I look at education very strongly, and I think we are going to talk about this more today. And I also look at personal knowledge graphs. We published a book last year on that topic, and I think that's an interesting space there. There's a lot that we could do with that. Larry: Cool. Thanks. And Thomas, what are you up to these days? Thomas: Yeah. So also thanks Larry for hosting us for your first in this series. I expect it'll be the first of many. So my day job, I work at Box, where I'm community manager. Box is a SaaS application which makes it easy for people to securely share content internally and externally. And it's interesting times right now with AI, generative AI, and Box has some interesting propositions in that space. Thomas: And then my passion project is obviously KGC. And this year we were excited to get some NSF funding to build the community around knowledge graphs around education and community. So that's also something that I'm really excited about. Yeah. Larry: Well, let's talk, there's a nice bundle of stuff to talk about there. I want to talk first about the founding of KGC because that's kind of where all this fun stuff starts. But very quickly, after this comes the Open Knowledge Network and the educational stuff. I want to talk about that whole bundle of activities. Larry: So how did you... First of all, if I recall correctly, KGC came up when you were both at Columbia. Is that how it happened... Or tell me the start. Thomas: Yeah. So we were both working at Columbia University. And I was building out an executive education program and looking for faculty to come up with ideas. And I connected with François and he had the idea of taking, he felt there had been a lot of academic conferences on this domain, and doing something very industry focused. Thomas: So that was the genesis. We got some budget from Columbia to get it off the ground. And the first conference, May 2019, went really well, and that was really the genesis of KGC. Larry: Cool. And François, can you tell me more about that connection. Because this is something that's always struck me about the conference, is the connection between the academy and industry. Were you the mastermind behind that part of it? François: Well, so mastermind is a big term. But basically for me, it was building my ideal conference, the one I missed and I would love to attend. I had worked for half of my career in academia, and there were many events in the space on that topic in academia. But after that, I devoted to mostly industry. And there were no events that were about this topic and relevant. François: And so being at Columbia University in New York, having tons of contacts in the field, it sounded like, well, maybe we should just start it, start that event. And then naturally, evolve from the contacts. Having contacts both in industry and in academia, we could reach and invite speakers, and there's that flare. So over time, we've evolved it more towards the industry side. But indeed, we have a lot of... Our community also has a lot of academics in it. François: I think bridging the gap is very important. There are a bunch of things that are important and that were the goals, like the original goals. One is really... Originally the real one was say, "Hey, knowledge graphs are not an academic topic." There's tons of people using it for solving real problems in the industry, knowledge graphs in production. Let's give them a voice. Let's hear them. Let's have a forum where we can share our experience, our issues. François: But also let's bring academia to tell us about state of the art, but also to hear about what the problems actually working with this technology in the real life, what their problem are. So that can influence researchers to say, "Maybe we should give priorities to this problem in our research because they're more important than others," to give you an example of the kind of interactions that is making possible. Yeah. I'll stop there. Larry: Thomas, did you have anything to add to that? Because I know you're.. I perceived of you as more the business... I mean, you're both academics, of course, but... Thomas: Yeah. I think this is a fascinating space. It's arguably one of the most interesting spaces in technology. And particularly generative AI is propelling this field forward. And I think the goal was to bring an industry lens to a really interesting space, democratize it, raise awareness. Thomas: So our mission, we had a tagline at the conference this year is,...
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
Interviews with experts on semantic technology, ontology design and engineering, linked data, and the semantic web.
HOSTED BY
Larry Swanson
CATEGORIES
Loading similar podcasts...