Generative AI in the Real World

PODCAST · technology

Generative AI in the Real World

In 2023, ChatGPT put AI on everyone’s agenda. Now, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

  1. 41

    Aishwarya Naresh Reganti on Making AI Work in Production

    As the founder and CEO of LevelUp Labs, Aishwarya Naresh Reganti helps organizations “really grapple with AI,” and through her teaching, she guides individuals who are doing the same. Aishwarya joined Ben to share her experience as a forward-deployed expert supporting companies that are putting AI into production. Listen in to learn the value all roles—from data folks and developers to SMEs like marketers—bring to the table when launching products; how AI flips the 80-20 rule on its head; the problem with evals (or at least, the term “evals”); enterprise versus consumer use cases; and when humans need to be part of the loop. “LLMs are super powerful,” Aishwarya explains. “So I think you need to really identify where to use that power versus where humans should be making decisions.” Watch now.

  2. 40

    Sharon Zhou on Post-Training

    Post-training gets your model to behave the way you want it to. As AMD VP of AI Sharon Zhou explains to Ben on this episode, the frontier labs are convinced, but the average developer is still figuring out how post-training works under the hood and why they should care. In their focused discussion, Sharon and Ben get into the process and trade-offs, techniques like supervised fine-tuning, reinforcement learning, in-context learning, and RAG, and why we still need post-training in the age of agents. (It’s how to get the agent to actually work.) Check it out.

  3. 39

    Fabiana Clemente on Synthetic Data for AI and Agentic Systems

    Synthetic data has been around for a long time, decades even. But as KPMG’s Fabiana Clemente points out, “That doesn’t mean there aren’t a lot of misconceptions.” Fabiana sat down with Ben to clarify some of the current applications of synthetic data and new directions the field is taking—working with offshore teams when privacy controls just don’t allow you to share actual datasets, improving fraud detection, building simulation models of the physical world, enabling multi-agent architectures. The takeaway? Whether your data’s synthetic or from the real world, success often comes down to the processes you’ve established to build data solutions. Watch now.

  4. 38

    Aurimas Griciūnas on AI Teams and Reliable AI Systems

    SwirlAI founder Aurimas Griciūnas helps tech professionals transition into AI roles and works with organizations to create AI strategy and develop AI systems. Aurimas joins Ben to discuss the changes he’s seen over the past couple years with the rise of generative AI and where we’re headed with agents. Aurimas and Ben dive into some of the differences between ML-focused workloads and those implemented by AI engineers—particularly around LLMOps and agentic workflows—and explore some of the concerns animating agent systems and multi-agent systems. Along the way, they share some advice for keeping your talent pipeline moving and your skills sharp. Here’s a tip: Don’t dismiss junior engineers.

  5. 37

    The Year in AI with Ksenia Se

    As the founder, editor, and lead writer of Turing Post, Ksenia Se spends her days peering into the emerging future of artificial intelligence. She joined Ben to discuss the current state of adoption: what people are actually doing right now, the big topics that got the most traction this year, and the trends to look for in 2026. Find out why Ksenia thinks the real action next year will be in areas like robotics and embodied AI, spatial intelligence, AI for science, and education.

  6. 36

    The LLMOps Shift with Abi Aryan

    MLOps is dead. Well, not really, but for many the job is evolving into LLMOps. In this episode, Abide AI founder and LLMOps author Abi Aryan joins Ben to discuss what LLMOps is and why it’s needed, particularly for agentic AI systems. Listen in to hear why LLMOps requires a new way of thinking about observability, why we should spend more time understanding human workflows before mimicking them with agents, how to do FinOps in the age of generative AI, and more.

  7. 35

    Laurence Moroney on AI at the Edge

    In this episode, Laurence Moroney, director of AI at Arm, joins Ben Lorica to chat about the state of deep learning frameworks—and why you may be better off thinking a step higher, on the solution level. Listen in for Laurence’s thoughts about posttraining; the evolution of on-device AI (and how tools like ExecuTorch and LiteRT are helping make it possible); why culturally specific models will only grow in importance; what Hollywood can teach us about LLM privacy; and more.

  8. 34

    Chris Butler on GenAI in Product Management

    In this episode, Ben Lorica and Chris Butler, director of product operations for GitHub's Synapse team, chat about the experimentation Chris is doing to incorporate generative AI into the product development process—particularly with the goal of reducing toil for cross-functional teams. It isn’t just automating busywork (although there’s some of that). He and his team have created agents that expose the right information at the right time, use feedback in meetings to develop “straw man” prototypes for the team to react to, and even offer critiques from specific perspectives (a CPO agent?). Very interesting stuff.

  9. 33

    Context Engineering with Drew Breunig

    In this episode, Ben Lorica and Drew Breunig, a strategist at the Overture Maps Foundation, talk all things context engineering: what’s working, where things are breaking down, and what comes next. Listen in to hear why huge context windows aren’t solving the problems we hoped they might, why companies shouldn’t discount evals and testing, and why we’re doing the field a disservice by leaning into marketing and buzzwords rather than trying to leverage what current crop of LLMs are actually capable of.

  10. 32

    Emmanuel Ameisen on LLM Interpretability

    In this episode, Ben Lorica and Anthropic interpretability researcher Emmanuel Ameisen get into the work Emmanuel’s team has been doing to better understand how LLMs like Claude work. Listen in to find out what they’ve uncovered by taking a microscopic look at how LLMs function—and just how far the analogy to the human brain holds.

  11. 31

    Understanding A2A with Heiko Hotz and Sokratis Kartakis

    Everyone is talking about agents: single agents and, increasingly, multi-agent systems. What kind of applications will we build with agents, and how will we build with them? How will agents communicate with each other effectively? Why do we need a protocol like A2A to specify how they communicate? Join Ben Lorica as he talks with Heiko Hotz and Sokratis Kartakis about A2A and our agentic future.

  12. 30

    Faye Zhang on Using AI to Improve Discovery

    In this episode, Ben Lorica and AI Engineer Faye Zhang talk about discoverability: how to use AI to build search and recommendation engines that actually find what you want. Listen in to learn how AI goes way beyond simple collaborative filtering—pulling in many different kinds of data and metadata, including images and voice, to get a much better picture of what any object is and whether or not it’s something the user would want.

  13. 29

    Luke Wroblewski on When Databases Talk Agent-Speak

    Join Luke Wroblewski and Ben Lorica as they talk about the future of software development. What happens when we have databases that are designed to interact with agents and language models rather than humans? We’re starting to see what that world will look like. It’s an exciting time to be a software developer.

  14. 28

    Jay Alammar on Building AI for the Enterprise

    Jay Alammar, director and Engineering Fellow at Cohere, joins Ben Lorica to talk about building AI applications for the enterprise, using RAG effectively, and the evolution of RAG into agents. Listen in to find out what kinds of metadata you need when you’re onboarding a new model or agent; discover how an emphasis on evaluation helps an organization improve its processes; and learn how to take advantage of the latest code-generation tools.Timestamps0:00: Introduction to Jay Alammar, director at Cohere. He’s also the author of Hands-On Large Language Models.0:30: What has changed in how you think about teaching and building with LLMs?0:45: This is my fourth year with Cohere. I really love the opportunity because it was a chance to join the team early (around the time of GPT-3). Aidan Gomez, one of the cofounders, was one of the coauthors of the transformers paper. I’m a student of how this technology went out of the lab and into practice. Being able to work in a company that’s doing that has been very educational for me. That’s a little of what I use to teach. I use my writing to learn in public. 2:20: I assume there’s a big difference between learning in public and teaching teams within companies. What’s the big difference?2:36: If you’re learning on your own, you have to run through so much content and news, and you have to mute a lot of it as well. This industry moves extremely fast. Everyone is overwhelmed by the pace. For adoption, the important thing is to filter a lot of that and see what actually works, what patterns work across use cases and industries, and write about those. 3:25: That’s why something like RAG proved itself as one application paradigm for how people should be able to use language models. A lot of it is helping people cut through the hype and get to what’s actually useful, and raise AI awareness. There’s a level of AI literacy that people need to come to grips with. 4:10: People in companies want to learn things that are contextually relevant. For example, if you’re in finance, you want material that will help deal with Bloomberg and those types of data sources, and material aware of the regulatory environment. 4:38: When people started being able to understand what this kind of technology was capable of doing, there were multiple lessons the industry needed to understand. Don’t think of chat as the first thing you should deploy. Think of simpler use cases, like summarization or extraction. Think about these as building blocks for an application. 5:28: It’s unfortunate that the name “generative AI” came to be used because the most important things AI can do aren’t generative: they’re the representation with embeddings that enable better categorization, better clustering, and enabling companies to make sense of large amounts of data. The next lesson was to not rely on a model’s information. In the beginning of 2023, there were so many news stories about the models being a search engine. People expected the model to be truthful, and they were surprised when it wasn’t. One of the first solutions was RAG. RAG tries to retrieve the context that will hopefully contain the answer. The next question was data security and data privacy: They didn’t want data to leave their network. That’s where private deployment of models becomes a priority, where the model comes to the data. With that, they started to deploy their initial use cases. 8:04: Then that system can answer systems to a specific level of difficulty—but with more difficulty, the system needs to be more advanced. Maybe it needs to search for multiple queries or do things over multiple steps. 8:31: One thing we learned about RAG was that just because something is in the context window doesn’t mean the machine won’t hallucinate. And people have developed more appreciation of applying even more context: GraphRAG, context engineering.

  15. 27

    Phillip Carter on Where Generative AI Meets Observability

    Phillip Carter, formerly of Honeycomb, and Ben Lorica talk about observability and AI—what observability means, how generative AI causes problems for observability, and how generative AI can be used as a tool to help SREs analyze telemetry data. There’s tremendous potential because AI is great at finding patterns in massive datasets, but it’s still a work in progress.About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.Timestamps0:00: Introduction to Phillip Carter, a product manager at Salesforce. We'll focus on observability, which he worked on at Honeycomb.0:35: Let’s have the elevator definition of observability first, then we’ll go into observability in the age of AI.0:44: If you google “What is observability?” you’re going to get 10 million answers. It’s an industry buzzword. There are a lot of tools in the same space.1:12: At a high level, I like to think of it in two pieces. The first is that this is an acknowledgement that you have a system of some kind, and you do not have the capability to pull that system onto your local machine and inspect what is happening at a moment in time. When something gets large and complex enough, it’s impossible to keep in your head. The product I worked on at Honeycomb is actually a very sophisticated querying engine that's tied to a lot of AWS services in a way that makes it impossible to debug on my laptop.2:40: So what can I do? I can have data, called telemetry, that I can aggregate and analyze. I can aggregate trillions of data points to say that this user was going through the system in this way under these conditions. I can pull from these different dimensions and hold something constant.3:20: Let’s look at how the values differ when I hold one thing constant. Let’s hold another thing constant. That gives me an overall picture of what is happening in the real world.3:37: That is the crux of observability. I'm debugging, but not by stepping through something on my local machine. I click a button, and I can see that it manifests in a database call. But there are potentially millions of users, and things go wrong somewhere else in the system. And I need to try to understand what paths lead to that, and what commonalities exist in those paths.4:14: This is my very high-level definition. It’s many operations, many tasks, almost a workflow as well, and a set of tools.4:32: Based on your description, observability people are sort of like security people. WIth AI, there are two aspects: observability problems introduced by AI, and the use of AI to help with observability. Let’s tackle each separately. Before AI, we had machine learning. Observability people had a handle on traditional machine learning. What specific challenges did generative AI introduce?5:36: In some respects, the problems have been constrained to big tech. LLMs are the first time that we got truly world-class machine learning support available behind an API call. Prior to that, it was in the hands of Google and Facebook and Netflix. They helped develop a lot of this stuff. They’ve been solving problems related to what everyone else has to solve now. They’re building recommendation systems that take in many signals. For a long time, Google has had natural language answers for search queries, prior to the AI overview stuff. That stuff would be sourced from web documents. They had a box for follow-up questions. They developed this before Gemini. It’s kind of the same tech. They had to apply observability to make this stuff available at large.

  16. 26

    Raiza Martin on Building AI Applications for Audio

    Audio is being added to AI everywhere: both in multimodal models that can understand and generate audio and in applications that use audio for input. Now that we can work with spoken language, what does that mean for the applications that we can develop? How do we think about audio interfaces—how will people use them, and what will they want to do? Raiza Martin, who worked on Google’s groundbreaking NotebookLM, joins Ben Lorica to discuss how she thinks about audio and what you can build with it.Timestamps0:00: Introduction to Raiza Martin, who cofounded Huxe and formerly led Google’s NotebookLM team. What made you think this was the time to trade the comforts of big tech for a garage startup?1:01: It was a personal decision for all of us. It was a pleasure to take NotebookLM from an idea to something that resonated so widely. We realized that AI was really blowing up. We didn’t know what it would be like at a startup, but we wanted to try. Seven months down the road, we’re having a great time.1:54: For the 1% who aren’t familiar with NotebookLM, give a short description.2:06: It’s basically contextualized intelligence, where you give NotebookLM the sources you care about and NotebookLM stays grounded to those sources. One of our most common use cases was that students would create notebooks and upload their class materials, and it became an expert that you could talk with.2:43: Here’s a use case for homeowners: put all your user manuals in there.3:14: We have had a lot of people tell us that they use NotebookLM for Airbnbs. They put all the manuals and instructions in there, and users can talk to it.3:41: Why do people need a personal daily podcast?3:57: There are a lot of different ways that I think about building new products. On one hand, there are acute pain points. But Huxe comes from a different angle: What if we could try to build very delightful things? The inputs are a little different. We tried to imagine what the average person’s daily life is like. You wake up, you check your phone, you travel to work; we thought about opportunities to make something more delightful. I think a lot about TikTok. When do I use it? When I’m standing in line. We landed on transit time or commute time. We wanted to do something novel and interesting with that space in time. So one of the first things was creating really personalized audio content. That was the provocation: What do people want to listen to? Even in this short time, we’ve learned a lot about the amount of opportunity.6:04: Huxe is mobile first, audio first, right? Why audio?6:45: Coming from our learnings from NotebookLM, you learn fundamentally different things when you change the modality of something. When I go on walks with ChatGPT, I just talk about my day. I noticed that was a very different interaction from when I type things out to ChatGPT. The flip side is less about interaction and more about consumption. Something about the audio format made the types of sources different as well. The sources we uploaded to NotebookLM were different as a result of wanting audio output. By focusing on audio, I think we’ll learn different use cases than the chat use cases. Voice is still largely untapped.8:24: Even in text, people started exploring other form factors: long articles, bullet points. What kinds of things are available for voice?8:49: I think of two formats: one passive and one interactive. With passive formats, there are a lot of different things you can create for the user. The things you end up playing with are (1) what is the content about and (2) how flexible is the content? Is it short, long, malleable to user feedback? With interactive content, maybe I’m listening to audio, but I want to interact with it. Maybe I want to join in. Maybe I want my friends to join in. Both of those contexts are new. I think this is what’s going to emerge in the next few years.

  17. 25

    Stefania Druga on Designing for the Next Generation

    How do you teach kids to use and build with AI? That’s what Stefania Druga works on. It’s important to be sensitive to their creativity, sense of fun, and desire to learn. When designing for kids, it’s important to design with them, not just for them. That’s a lesson that has important implications for adults, too. Join Stefania Druga and Ben Lorica to hear about AI for kids and what that has to say about AI for adults.Timestamps0:27: You’ve built AI education tools for young people, and after that, worked on multimodal AI at DeepMind. What have kids taught you about AI design?0:48: It’s been quite a journey. I started working on AI education in 2015. I was on the Scratch team in the MIT Media Lab. I worked on Cognimates so kids could train custom models with images and texts. Kids would do things I would have never thought of, like build a model to identify weird hairlines or to recognize and give you backhanded compliments. They did things that are weird and quirky and fun and not necessarily utilitarian.2:05: For young people, driving a car is fun. Having a self-driving car is not fun. They have lots of insights that could inspire adults.2:25: You’ve noticed that a lot of the users of AI are Gen Z, but most tools aren’t designed with them in mind. What is the biggest disconnect?2:47: We don’t have a knob for agency to control how much we delegate to the tools. Most of Gen Z use off-the-shelf AI products like ChatGPT, Gemini, and Claude. These tools have a baked-in assumption that they need to do the work rather than asking questions to help you do the work. I like a much more Socratic approach. A big part of learning is asking and being asked good questions. A huge role for generative AI is to use it as a tool that can teach you things, ask you questions; [it’s] something to brainstorm with, not a tool that you delegate work to.4:25: There’s this big elephant in the room where we don’t have conversations or best practices for how to use AI.4:42: You mentioned the Socratic approach. How do you implement the Socratic approach in the world of text interfaces?4:57: In Cognimates, I created a copilot for kids coding. This copilot doesn’t do the coding. It asks them questions. If a kid asks, “How do I make the dude move?” the copilot will ask questions rather than saying, “Use this block and then that block.”6:40: When I designed this, we started with a person behind the scenes, like the Wizard of Oz. Then we built the tool and realized that kids really want a system that can help them clarify their thinking. How do you break down a complex event into steps that are good computational units?8:06: The third discovery was affirmations—whenever they did something that was cool, the copilot says something like “That’s awesome.” The kids would spend double the time coding because they had an infinitely patient copilot that would ask them questions, help them debug, and give them affirmations that would reinforce their creative identity.8:46: With those design directions, I built the tool. I’m presenting a paper at the ACM IDC (Interaction Design for Children) conference that presents this work in more detail. I hope this example gets replicated.9:26: Because these interactions and interfaces are evolving very fast, it’s important to understand what young people want, how they work and how they think, and design with them, not just for them.9:44: The typical developer now, when they interact with these things, overspecifies the prompt. They describe so precisely. But what you’re describing is interesting because you’re learning, you’re building incrementally. We’ve gotten away from that as grown-ups.10:28: It’s all about tinkerability and having the right level of abstraction. What are the right Lego blocks? A prompt is not tinkerable enough. It doesn’t allow for enough expressivity. It needs to be composable and allow the user to be in control.

  18. 24

    Douwe Kiela on Why RAG Isn’t Dead

    Join our host Ben Lorica and Douwe Kiela, cofounder of Contextual AI and author of the first paper on RAG, to find out why RAG remains as relevant as ever. Regardless of what you call it, retrieval is at the heart of generative AI. Find out why—and how to build effective RAG-based systems.Points of Interest0:25: Today’s topic is RAG. With frontier models advertising massive context windows, many developers wonder if RAG is becoming obsolete. What’s your take?1:03: We now have a blog post: isragdeadyet.com. If something keeps getting pronounced dead, it will never die. These long context models solve a similar problem to RAG: how to get the relevant information into the language model. But it’s wasteful to use the full context all the time. If you want to know who the headmaster is in Harry Potter, do you have to read all the books? 2:04: What will probably work best is RAG plus long context models. The real solution is to use RAG, find as much relevant information as you can, and put it into the language model. The dichotomy between RAG and long context isn’t a real thing.2:48: One of the main issues may be that RAG systems are annoying to build, and long context systems are easy. But if you can make RAG easy too, it’s much more efficient.3:07: The reasoning models make it even worse in terms of cost and latency. And if you’re talking about something with a lot of usage, high repetition, it doesn’t make sense. 3:39: You’ve been talking about RAG 2.0, which seems natural: emphasize systems over models. I’ve long warned people that RAG is a complicated system to build because there are so many knobs to turn. Few developers have the skills to systematically turn those knobs. Can you unpack what RAG 2.0 means for teams building AI applications?4:22: The language model is only a small part of a much bigger system. If the system doesn’t work, you can have an amazing language model and it’s not going to get the right answer. If you start from that observation, you can think of RAG as a system where all the model components can be optimized together.5:40: What you’re describing is similar to what other parts of AI are trying to do: an end-to-end system. How early in the pipeline does your vision start?6:07: We have two core concepts. One is a data store—that’s really extraction, where we do layout segmentation. We collate all of that information and chunk it, store it in the data store, and then the agents sit on top of the data store. The agents do a mixture of retrievers, followed by a reranker and a grounded language model.7:02: What about embeddings? Are they automatically chosen? If you go to Hugging Face, there are, like, 10,000 embeddings.7:15: We save you a lot of that effort. Opinionated orchestration is a way to think about it.7:31: Two years ago, when RAG started becoming mainstream, a lot of developers focused on chunking. We had rules of thumb and shared stories. This eliminates a lot of that trial and error.8:06: We basically have two APIs: one for ingestion and one for querying. Querying is contextualized on your data, which we’ve ingested. 8:25: One thing that’s underestimated is document parsing. A lot of people overfocus on embedding and chunking. Try to find a PDF extraction library for Python. There are so many of them, and you can’t tell which ones are good. They’re all terrible.8:54: We have our stand-alone component APIs. Our document parser is available separately. Some areas, like finance, have extremely complex layouts. Nothing off the shelf works, so we had to roll our own solution. Since we know this will be used for RAG, we process the document to make it maximally useful. We don’t just extract raw information. We also extract the document hierarchy. That is extremely relevant as metadata when you’re doing retrieval.10:11: There are open source libraries—what drove you to build your own, which I assume also encompasses OCR?

  19. 23

    Danielle Belgrave on Generative AI in Pharma and Medicine

    Join Danielle Belgrave and Ben Lorica for a discussion of AI in healthcare. Danielle is VP of AI and machine learning at GSK (formerly GlaxoSmithKline). She and Ben discuss using AI and machine learning to get better diagnoses that reflect the differences between patients. Listen in to learn about the challenges of working with health data—a field where there’s both too much data and too little, and where hallucinations have serious consequences. And if you’re excited about healthcare, you’ll also find out how AI developers can get into the field.Points of Interest0:00: Introduction to Danielle Belgrave, VP of AI and machine learning at GSK. Danielle is our first guest representing Big Pharma. It will be interesting to see how people in pharma are using AI technologies.0:49: My interest in machine learning for healthcare began 15 years ago. My PhD was on understanding patient heterogeneity in asthma-related disease. This was before electronic healthcare records. By leveraging different kinds of data, genomics data and biomarkers from children, and seeing how they developed asthma and allergic diseases, I developed causal modeling frameworks and graphical models to see if we could identify who would respond to what treatments. This was quite novel at the time. We identified five different types of asthma. If we can understand heterogeneity in asthma, a bigger challenge is understanding heterogeneity in mental health. The idea was trying to understand heterogeneity over time in patients with anxiety. 4:12: When I went to DeepMind, I worked on the healthcare portfolio. I became very curious about how to understand things like MIMIC, which had electronic healthcare records, and image data. The idea was to leverage tools like active learning to minimize the amount of data you take from patients. We also published work on improving the diversity of datasets. 5:19: When I came to GSK, it was an exciting opportunity to do both tech and health. Health is one of the most challenging landscapes we can work on. Human biology is very complicated. There is so much random variation. To understand biology, genomics, disease progression, and have an impact on how drugs are given to patients is amazing.6:15: My role is leading AI/ML for clinical development. How can we understand heterogeneity in patients to optimize clinical trial recruitment and make sure the right patients have the right treatment?6:56: Where does AI create the most value across GSK today? That can be both traditional AI and generative AI.7:23: I use everything interchangeably, though there are distinctions. The real important thing is focusing on the problem we are trying to solve, and focusing on the data. How do we generate data that’s meaningful? How do we think about deployment?8:07: And all the Q&A and red teaming.8:20: It’s hard to put my finger on what’s the most impactful use case. When I think of the problems I care about, I think about oncology, pulmonary disease, hepatitis—these are all very impactful problems, and they’re problems that we actively work on. If I were to highlight one thing, it’s the interplay between when we are looking at whole genome sequencing data and looking at molecular data and trying to translate that into computational pathology. By looking at those data types and understanding heterogeneity at that level, we get a deeper biological representation of different subgroups and understand mechanisms of action for response to drugs.

  20. 22

    The Startup Opportunity with Gabriela de Queiroz

    Ben Lorica and Gabriela de Queiroz, director of AI at Microsoft, talk about startups: specifically, AI startups. How do you get noticed? How do you generate real traction? What are startups doing with agents and with protocols like MCP and A2A? And which security issues should startups watch for, especially if they’re using open weights models?Points of Interest0:30: You work with a lot of startups and founders. How have the opportunities for startups in generative AI changed? Are the opportunities expanding?0:56: Absolutely. The entry barrier for founders and developers is much lower. Startups are exploding—not just the amount but also the interesting things they are doing.1:19: You catch startups when they’re still exploring, trying to build their MVP. So startups need to be more persistent in trying to find differentiation. If anyone can build an MVP, how do you distinguish yourself?1:46: At Microsoft, I drive several strategic initiatives to help growth-stage startups. I also guide them in solving real pain points using our stacks. I’ve designed programs to spotlight founders.3:08: I do a lot of engagement where I help startups go from the prototype or MVP to impact. An MVP is not enough. I need to see a real use case and I need to see some traction. When they have real customers, we see whether their MVP is working.3:49: Are you starting to see patterns for gaining traction? Are they focusing on a specific domain? Or do they have a good dataset?4:02: If they are solving a real use case in a specific domain or niche, this is where we see them succeed. They are solving a real pain, not building something generic. 4:27: We’re both in San Francisco, and solving a specific pain or finding a specific domain means something different. Techie founders can build something that’s used by their friends, but there’s no revenue.5:03: This happens everywhere, but there’s a bigger culture around that here. I tell founders, “You need to show me traction.” We have several companies that started as open source, then they built a paid layer on top of the open source project.5:34: You work with the folks at Azure, so presumably you know what actual enterprises are doing with generative AI. Can you give us an idea of what enterprises are starting to deploy? What is the level of comfort of enterprise with these technologies?6:06: Enterprises are a little bit behind startups. Startups are building agents. Enterprises are not there yet. There’s a lot of heavy lifting on the data infrastructure that they need to have in place. And their use cases are complex. It’s similar to Big Data, where the enterprise took longer to optimize their stack.7:19: Can you describe why enterprises need to modernize their data stack? 7:42: Reality isn’t magic. There’s a lot of complexity in data and how data is handled. There is a lot of data security and privacy that startups aren’t aware of but are important to enterprises. Even the kinds of data—the data isn’t well organized, there are different teams using different data sources.8:28: Is RAG now a well-established pattern in the enterprise?8:44: It is. RAG is part of everybody’s workflow.8:51: The common use cases that seem to be further along are customer support, coding—what other buckets can you add?9:07: Customer support and tickets are among the main pains and use cases. And they are very expensive. So it’s an easy win for enterprises when they move to GenAI or AI agents. 9:48: Are you saying that the tool builders are ahead of the tool buyers?10:05: You’re right. I talk a lot with startups building agents. We discuss where the industry is heading and what the challenges are. If you think we are close to AGI, try to build an agent and you’ll see how far we are from AGI. When you want to scale, there’s another level of difficulty. When I ask for real examples and customers, the majority are not there yet.

  21. 21

    Securing AI with Steve Wilson

    Join Steve Wilson and Ben Lorica for a discussion of AI security. We all know that AI brings new vulnerabilities into the software landscape. Steve and Ben talk about what makes AI different, what the big risks are, and how you can use AI safely. Find out how agents introduce their own vulnerabilities, and learn about resources such as OWASP that can help you understand them. Is there a light at the end of the tunnel? Can AI help us build secure systems even as it introduces its own vulnerabilities? Listen to find out.Points of Interest0:49: Now that AI tools are more accessible, what makes LLM and agentic AI security fundamentally different from traditional software security?1:20: There’s two parts. When you start to build software using AI technologies, there is a new set of things to worry about. When your software is getting near to human-level smartness, the software is subject to the same issues as humans: It can be tricked and deceived. The other part is what the bad guys are doing when they have access to frontier-class AIs.2:16: In your work at OWASP, you listed the top 10 vulnerabilities for LLMs. What are the top one or two risks that are causing the most serious problems?2:42: I’ll give you the top three. The first one is prompt injection. By feeding data to the LLM, you can trick the LLM into doing something the developers didn’t intend.3:03: Next is the AI supply chain. The AI supply chain is much more complicated than the traditional supply chain. It’s not just open source libraries from GitHub. You’re also dealing with gigabytes of model weights and terabytes of training data, and you don’t know where they’re coming from. And sites like Hugging Face have malicious models uploaded to them. 3:49: The last one is sensitive information disclosure. Bots are not good at knowing what they should not talk about. When you put them into production and give them access to important information, you run the risk that they will disclose information to the wrong people.4:25: For supply chain security, when you install something in Python, you’re also installing a lot of dependencies. And everything is democratized, so people can do a little on their own. What can people do about supply chain security?5:18: There are two flavors: I’m building software that includes the use of a large language model. If I want to get Llama from Meta as a component, that includes gigabytes of floating point numbers. You need to put some skepticism around what you’re getting.6:01: Another hot topic is vibe coding. People who have never programmed or haven’t programmed in 20 years are coming back. There are problems like hallucinations. With generated code, they will make up the existence of a software package. They’ll write code that imports that. And attackers will create malicious versions of those packages and put them on GitHub so that people will install them.7:28: Our ability to generate code has gone up 10x to 100x. But our ability to security check and quality check hasn’t. For people starting, get some basic awareness of the concepts around application security and what it means to manage the supply chain.7:57: We need a different generation of software composition environment tools that are designed to work with vibe coding and integrate into environments like Cursor.8:44: We have good basic guidelines for users: Does a library have a lot of users? A lot of downloads? A lot of stars on GitHub? There are basic indications. But professional developers augment that with tooling. We need to bring those tools into vibe coding.9:20: What’s your sense of the maturity of guardrails? 9:50: The good news is that the ecosystem around guardrails started really soon after ChatGPT came out. Things at the top of the OWASP Top 10, prompt injection and information disclosure, indicated that you needed to police the trust boundaries around your LLM.

  22. 20

    Shreya Shankar on AI for Corporate Data Processing

    Businesses have a lot of data—but most of that data is unstructured textual data: reports, catalogs, emails, notes, and much more. Without structure, business analysts can’t make sense of the data; there is value in the data, but it can’t be put to use. AI can be a tool for finding and extracting the structure that’s hidden in textual data. In this episode, Ben and Shreya talk about a new generation of tooling that brings AI to enterprise data processing.Points of Interest0:18: One of the themes of your work is a specific kind of data processing. Before we go into tools, what is the problem you’re trying to address? 0:52: For decades, organizations have been struggling to make sense of unstructured data. There’s a massive amount of text that people make sense of. We didn’t have the technology to do that until LLMs came around.1:38: I’ve spent the last couple of years building a processing framework for people to manipulate unstructured data with LLMs. How can we extract semantic data?1:55: The prior art would be using NLP libraries and doing bespoke tasks?2:12: We’ve seen two flavors of approach: bespoke code and crowdsourcing. People still do both. But now LLMs can simplify the process.2:45: The typical task is “I have a large collection of unstructured text and I want to extract as much structure as possible.” An extreme would be a knowledge graph; in the middle would be the things that NLP people do. Your data pipelines are designed to do this using LLMs.3:22: Broadly, the tasks are thematic extraction: I want to extract themes from documents. You can program LLMs to find themes. You want some user steering and guidance for what a theme is, then use the LLM for grouping.4:04: One of the tools you built is DocETL. What’s the typical workflow?4:19: The idea is to write MapReduce pipelines, where map extracts insights, and group does aggregation. Doing this with LLMs means that the map is described by an LLM prompt. Maybe the prompt is “Extract all the pain points and any associated quotes.” Then you can imagine flattening this across all the documents, grouping them by the pain points, and another LLM can do the summary to produce a report. DocETL exposes these data processing primitives and orchestrates them to scale up and across task complexity.5:52: What if you want to extract 50 things from a map operation? You shouldn’t ask an LLM to do 50 things at once. You should group them and decompose them into subtasks. DocETL does some optimizations to do this.6:18: The user could be a noncoder and might not be working on the entire pipeline.7:00: People do that a lot; they might just write a single map operation.7:16: But the end user you have in mind doesn’t even know the words “map” and “filter.”7:22: That's the goal. Right now, people still need to learn data processing primitives. 7:49: These LLMs are probabilistic; do you also set the expectations with the user that you might get different results every time you run the pipeline?8:16: There are two different types of tasks. One is where you want the LLM to be accurate and there is an exact ground truth—for example, entity extraction. The other type is where you want to offload a creative process to the LLM—for example, “Tell me what’s interesting in this data.” They’ll run it until there are no new insights to be gleaned.  When is nondeterminism a problem? How do you engineer systems around it?9:56: You might also have a data engineering team that uses this and turns PDF files into something like a data warehouse that people can query. In this setting, are you familiar with lakehouses architecture and the notion of the medallion architecture?10:49: People actually use DocETL to create a table out of PDFs and put it in a relational database. That’s the best way to think about how to move forward in the enterprise setting. I’ve also seen people using these tables in RAG or downstream LLM applications. 

  23. 19

    Vibe Coding with Steve Yegge

    Ever since Andrej Karpathy first tweeted it, “vibe coding” has been on every software developer’s mind. Join Ben Lorica and Steve Yegge to find out what vibe coding means, especially in a professional context. Going beyond the current memes, what will the future of software development look like when we have multiple agents? And how do you prepare for it? Don’t push back against AI now; lean into it.Points of Interest0:36: Let’s start with CHOP. What do you mean by “chat-oriented programming,” and how does it change the role of a software developer?1:02: Andrej Karpathy has come up with a more accessible packaging: “vibe coding.” Gene Kim and I are going with the flow in our book, which is also about agentic programming.2:02: The industry has the widest distribution of understanding that I’ve ever seen. We’ve got people saying, “You ought to stop using AI”; we’ve got people refusing to use AI; we’ve got people spread out in what they’re using.3:03: Vibe coding started off as “it’s easy.” But people misinterpreted Karpathy’s tweet to mean that the LLM is ready to write all the code. That’s led to production incidents, “no vibe coding,” and a debate over whether you can turn your brain off.3:35: Google decided to adopt vibe coding because you can do it as a grownup, as an engineer. You don’t have to accept whatever AI gives you. If you’re doing a weekend project or a prototype, you don’t have to look carefully at the output. But if you’re doing production coding, you have to demand excellence of your LLM. You have to demand that it produces code to a professional standard. That’s what Google does now.4:38: Vibe coding means using AI. Agents like Claude Code are pretty much the same. 4:58: There’s traditional AI-assisted coding (completions); with vibe coding, the trust in AI is higher. The developer becomes a high-level orchestrator instead of writing code line by line.5:37: Trust is a huge dimension. It’s the number one thing that is keeping the industry from rocketing forward on adoption. With chat programming, even though it’s been eclipsed by agent programming, you get the LLM to do the work—but you have to validate it yourself. You’re nudging it over and over again. Many senior engineers don’t try hard enough. You wouldn’t boot an intern to the curb for failing the first time.7:18: AI doesn’t work right the first time. You can’t trust anything. You have to validate and verify. This is what people have to get over.7:53: You’re still accountable for the code. You own the code. But people are struggling with the new role, which is being a team lead. This is even more true with coding agents like Claude Code. You’re more productive, but you’re not a programmer any more. 8:51: For people to make the transition to vibe coding, what are some of the core skill sets they'll have to embrace?9:07: Prompt engineering is a separate discipline from CHOP or vibe coding. Prompt engineering is static prompting. It’s for embedding AI in an application. Chat programming is dynamic; lots of throwaway prompts that are only used once. 10:13: Engineers should know all the skills of AI. With the AI Engineering book by Chip Huyen, that’s what engineers need to know. Those are the skills you need to put AI in applications, even if you’re not doing product development.11:15: Or put the book into a RAG system. 12:00: Vibe coding is another skill to learn. Learn it; don’t push back on it. Learn how it works, learn how to push it. Claude Code isn’t even an IDE. The form factor is terrible right now. But if you try it and see how powerful agentic coding is, you’ll be shocked. The agent does all the stuff you used to have to tell it to do.13:57: You’ll say, “Here’s a Jira ticket; fix it for me.” First it will find the ticket; it will evaluate your codebase using the same tools you do; then it will come up with an execution plan. It’s nuts what they are doing. We all knew this was coming, but nobody knew it would be here now.

  24. 18

    Interactions Between Humans and AI with Rajeshwari Ganesan

    In this edition of Generative AI in the Real World, Ben Lorica and Rajeshwari Ganesan talk about how to put generative AI in closer touch with human needs and requirements. AI isn’t all about building bigger models and benchmarks. To use it effectively, we need better interfaces; we need contexts that support groups rather than individuals; we need applications that allow people to explore the space they’re working in. Ever since ChatGPT, we’ve assumed that chat is the best interface for AI. We can do better.Points of Interest0:17: We’re both builders and consumers of AI. How does this dual relationship affect how we design interfaces?0:41: A lot of advances happen in the large language models. But when we step back, are these models consumable by users? We lack the kind of user interface we need. With ChatGPT, conversations can go round and round, turn by turn. If you don’t give the right context, you don’t get the right answer. This isn’t good enough.1:47: Model providers go out of their way to coach users, telling them how to prompt new models. All the providers have coaching tips. What alternatives should we be exploring?2:50: We’ve made certain initial starts. GitHub Copilot and mail applications with typeahead don’t require heavy-duty prompting. The AI coinhabits the same workspace as the user. The context is derived from the workspace. The second part is that generative interfaces are emerging. It’s not the content but the experience that’s generated by the machine.5:22: Interfaces are experience. Generate the interface based on what the user needs at any given point. At Infosys, we do a lot of legacy modernization—that’s where you really need good interfaces. We have been able to create interfaces where the user is able to walk into a latent space—an area that gives them an understanding of what they want to explore.7:11: A latent space is an area that is meaningful for the user’s interaction. A space that’s relatable and semantically understandable. The user might say, “Tell me all the modules dealing with fraud detection.” Exploring the space that the user wants is possible. Let’s say I describe various aspects of a project I’m launching. The machine looks at my thought process. It looks at my answers, breaks [them] up part by part, judges the quality of response, and gets into the pieces that need to be better.9:44: One of the things people struggle with is evaluation. Not of a single agent—most tasks require multiple agents because there are different skills and tasks involved. How do we address evaluation and transparency?10:42: When it comes to evaluation, I think in terms of trustworthy systems. A lot of focus on evaluation comes from model engineering. But one critical piece of building trustworthy systems is the interface itself. A human has an intent and is requesting a response. There is a shared context—and if the context isn’t shared properly, you won’t get the right response. Prompt engineering is difficult; if you don’t give the right context, you go in a loop.12:26: Trustworthiness breaks because you’re dependent on the prompt. The coinhabited workspace that takes the context from the environment plays a big role.12:46: Once you give the questions to the machine, the machine gives a response. But if you don’t make a response that is consumable by the user, that’s a problem.13:18: Trustworthiness of systems in the context of agent frameworks is much more complex. Humans don’t just have factual knowledge. We have beliefs. Humans have a belief state, and if an agent doesn’t have access to the belief state, they will get into something called reasoning derailment. If the interface can’t bring belief states to life, you will have a problem.

  25. 17

    Getting Beyond the Demo with Hamel Husain

    In this episode, Ben Lorica and Hamel Husain talk about how to take the next steps with artificial intelligence. Developers don’t need to build their own models—but they do need basic data skills. It’s important to look at your data, to discover your model’s weaknesses, and to use that information to develop test suites and evals that show whether your model is behaving well.Links to ResourcesHamel's upcoming course on evaluating LLMs.Hamel's O'Reilly publications: “AI Essentials for Tech Executives” and “What We Learned from a Year of Building with LLMs”Hamel's website.Points of Interest0:39: What inspired you and your coauthors to create a series on practical uses of foundation models? What gaps in existing resources did you aim to address?0:56: We’re publishing “AI Essentials for Tech Executives”¹ now; last year, we published “What We Learned from a Year of Building with LLMs.”² Coming from the perspective of a machine learning engineer or data scientist—you don’t need to build or train models. You can use an API. But there are skills and practices from data science that are crucial.2:16: There are core skills around data analysis and error analysis and basic data literacy that you need to get beyond a demo.2:43: What are some crucial shifts in mindset that you’ve written about on your blog?3:24: The phrase we keep repeating is “look at your data." What does “look at your data" mean?3:51: There’s a process that you should use. Machine learning systems have a lot in common with modern AI. How do you test those? Debug them? Improve them? Look at your data; people fail on this. They do vibe checks, but they don’t really know what to do next.4:56: Looking at your data helps ground everything. Look at actual logs of user interactions. If you don’t have users, generate interactions synthetically. See how your AI is behaving and write detailed notes about failure modes. Do some analysis on those notes: Categorize them. You’ll start to see patterns and your biggest failure modes. This will give you a sense of what to prioritize.6:08: A lot of people are missing that. People aren’t familiar with the rich ecosystem of data tools, so they get stuck. We know that it’s crucial to sample some data and look at it.7:08: It’s also important that you have the domain expert do it with the engineers. On a lot of teams, the domain expert isn’t an engineer.7:44: Another thing is focusing on processes, not tools. Tools aren’t the problem—the problem is that your AI isn’t working. The tools won’t take care of it for you. There’s a process: how to debug, look at, and measure AI. Those are the main mind shifts.9:32: Most people aren’t building models (pretraining); they might be doing posttraining on a base model. But there are a lot of experiments that you still have to run. There’[re] knobs you have to turn, and without the ability to do it systematically and measure, you’re just mindless[ly] turning knobs without learning much.10:29: I’ve held open office hours for people to ask questions about evals. What people ask most is what to eval. There are many components. You can’t and shouldn’t test everything. You should be grounded in your actual failure modes. Prioritize your tests on that.11:30: Another topic is what I call the prototype purgatory. A lot of people have great demos. The demos work, and might even be deployable. But people struggle with pulling the trigger.12:15: A lot of people don’t know how to evaluate their AI systems if they don’t have any users. One way to help yourself is to generate synthetic data. Have an LLM generate realistic user inputs and brainstorm different personas and scenarios. That bootstraps you significantly towards production.13:57: There’s a new open source tool that does something like this for agents. It’s called IntelAgent. It generates synthetic data that you might not come up with yourself.

  26. 16

    Agents—The Next Step in AI with Shelby Heinecke

    Join Shelby Heinecke, senior research manager at Salesforce, and Ben Lorica as they talk about agents, AI models that can take action on behalf of their users. Are they the future—or at least the hot topic for the coming year? Where are we with smaller models? And what do we need to improve the agent stack? How do you evaluate the performance of models and agents?About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.Points of Interest0:29: Introduction—Our guest is Shelby Heinecke, senior research manager at Salesforce.0:43: The hot topic of the year is agents. Agents are increasingly capable of GUI-based interactions. Is this my imagination?1:20: The research community has made tremendous progress to make this happen. We’ve made progress on function calling. We’ve trained LLMs to call the correct functions to perform tasks like sending emails. My team has built large action models that, given a task, write a plan and the API calls to execute that. This is one piece. A second piece is when you don’t know the functions a priori, giving the agent the ability to reason about images and video.3:07: We released multimodal action models. They take an image and text and produce API calls. That makes navigating GUIs a reality.3:34: A lot of knowledge work relies on GUI interactions. Is this just robotic process automation rebranded?4:05: We’ve been automating forever. What’s special is that automation is driven by LLMs, and that combination is particularly powerful.4:32: The earlier generation of RPA was very tightly scripted. With multimodal models that can see the screen, they can really understand what’s happening. Now we’re beginning to see reasoning enhanced models. Inference scaling will be important.5:52: Multimodality and reasoning-enhanced models will make agents even more powerful.6:00: I’m very interested in how much reasoning we can pack into a smaller model. Just this week DeepSeek also released smaller distilled versions.7:08: Every month the capability of smaller models has been pushed. Smaller models right now may not compare to large models. But this year, we can push the boundaries.7:38: What’s missing from the agent stack? You have the model—some notion of memory. You have tools that the agent can call. There are agent frameworks. You need monitoring, observability. Everything depends on the model’s capabilities: There’s a lot of fragmentation, and the vocabulary is still unclear. Where do agents usually fall short?9:00: There’s a lot of room for improvement with function calling and multistep function calling. Earlier in the year, it was just single step. Now there’s multistep. That expands our horizons.9:59: We need to think about deploying agents that solve complex tasks that take multiple steps. We will need to think more about efficiency and latency. With increased reasoning abilities, latency increases.10:45: This year, we’ll see small language models and agents come together.10:58: At the end of the day, this is an empirical discipline and you need to come up with your own benchmarks and eval tools. What are you doing in terms of benchmarks and eval?11:36: This is the most critical piece of applied research. You’re deploying models for a purpose. You still need an evaluation set for that use case. As we work with a variety of products, we cocreate evaluation sets with our partners.12:38: We’ve released the CRM benchmark. It’s open. We’ve created CRM-style datasets with CRM-type tasks. You can see the open source models and small models on these leaderboards and how they perform.13:16: How big do these datasets have to be?

  27. 15

    Measuring Skills with Kian Katanforoosh

    How do we measure skills in an age of AI? That question has an effect on everything from hiring to productive teamwork. Join Kian Katanforoosh, founder and CEO of Workera, and Ben Lorica for a discussion of how we can use AI to assess skills more effectively. How do we get beyond pass/fail exams to true measures of a person’s ability?Points of Interest0:28: Can you give a sense of how big the market for skills verification is?0:42: It’s extremely large. Anything that touches skills data is on the rise. When you extrapolate university admissions to someone’s career, you realize that there are many times when they need to validate their skills.1:59: Roughly what’s the breakdown between B2B and B2C?2:04: Workera is exclusively B2B and federal. However, there are also assessments focused on B2C. Workera has free assessments for consumers.3:00: Five years ago, there were tech companies working on skill assessment. What were prior solutions before the rise of generative AI?3:27: Historically, assessments have been used for summative purposes. Pass/fail, high stakes, the goal is to admit or reject you. We provided the use of assessments for people to know where they stand, compare themselves to the market, and decide what to study next. That takes different technology.4:50: Generative AI became much more prominent with the rise of ChatGPT. What changed?5:09: Skills change faster than ever. You need to update skills much more frequently. The half-life of skills used to be over 10 years. Today, it’s estimated to be around 2.5 years in the digital area. Writing a quiz is easy. Writing a good assessment is extremely hard. Validity is a concept showing that what you intend to measure is what you are measuring. AI can help.6:39: AI can help with modeling the competencies you want to measure.6:57: AI can help streamline the creation of an assessment.7:22: AI can help test the assessment with synthetic users.7:42: AI can help with monitoring postassessment. There are a lot of things that can go wrong.8:25: Five years ago in program, people used tests to filter people out. That has changed; people will use coding assistants on the job. Why shouldn’t I be able to use a coding assistant when I’m doing an assessment?9:16: You should be able to use it. The assessment has to change. The previous generation of assessments focused on syntax. Do you care if you forgot a semicolon? Assessments should focus on other cognitive levels, such as analyzing and synthesizing information.10:06: Because of generative models, it’s become easier to build an impressive prototype. Evaluation is the hard point. Assessment is all about evaluation, so the bar is much higher for you.10:48: Absolutely. We have a study that calculates the number of skills needed to prototype versus deploy AI. You need about 1,000 skills to prototype AI. You need about 10,000 skills for production AI.12:39: If I want to do skills assessment on an unfamiliar workflow, say full stack web development, what’s your process for onboarding?13:17: We have one agent that’s responsible for competency modeling. You can have a subject-matter expert (SME) share a job description or task analysis or job architecture. We take that information and granularize the tasks worth measuring. At that point, there’s a human in the loop.14:27: Where does AI help? What does the AI need? What would you like to see from people using your tool?15:04: Language models have been trained on pretty much everything online. You can get a pretty good answer from AI. The SME takes that from 80% to 100%. Now, there are issues with that process. We separate the core catalog of skills from the custom catalog, where customers create custom assessments. A standardized assessment lets you benchmark against other people or companies.16:32: If you take a custom assessment, it’s highly relevant to your needs, even though comparisons aren’t possible.16:41: It’s obviously anonymized, right?

  28. 14

    Chloé Messdaghi on AI Security, Policy, and Regulation

    Chloé Messdaghi and Ben Lorica discuss AI security—a subject of increasing importance as AI-driven applications roll out into the real world. There’s a knowledge gap: Security workers don’t understand AI, and AI developers don’t understand security. It’s important to be aware of all the resources that are available. Make sure to bring everyone together to develop AI security policies and playbooks, including AI developers and experts. Be aware of all the resources that are available; we expect to see AI security certifications and training becoming available in the coming year.Points of Interest0:24: How does AI security differ from traditional cybersecurity?0:44: AI is a black box: We don’t have transparency to show how AI works or explainability to show how it makes decisions. Black boxes are hard to secure.2:12: There’s a huge knowledge gap. Companies aren’t doing what is needed.2:24: When you talk to executives, do you distinguish between traditional AI and ML and the new generative AI models?2:43: We talk about older models as well. But security is as much about, What am I supposed to do? We’ve had AI for a while, but for some time, security has not been part of that conversation.3:26: Where do security folks go to learn how to secure AI? There are no certifications. We’re playing a massive catchup game.3:53: What’s the state of awareness about incident response strategies for AI?4:15: Even in traditional cybersecurity, we’ve always had an issue of making sure incident response plans aren’t ad hoc or expired. A lot of it is being aware of all the technologies and products that the company has been using. It’s hard to protect if you don’t know everything in your environment.5:19: The AI Threat Landscape report found that 77% of the companies reported breaches in their AI systems.5:40: Last year, a statistic came out about the adoption of AI-related cybersecurity measures. For North America, 70% of the organizations said they did one or two out of five security measures. 24% adopted two to four measures.6:35: What are some of the first things I should be thinking about to update my incident response playbook?6:51: Make sure you have all the right people in the room. We still have issues with department silos. CISOs can be dismissed or not even in the room when it comes to decisions. There are concerns about restricting innovation or product launch dates. You have to have CTOs, data scientists, ML developers, and all the right people to ensure that there is safety and that everyone has taken precautions.7:48: For companies with a mature cybersecurity incident playbook that they want to update for AI, what AI brings is that you have to include more people.8:17: You have to realize that there’s an AI knowledge gap, and that there’s insufficient security training for data scientists. Security folks don’t know where to turn for education. There aren’t a lot of courses or programs out there. We’ll see a lot of that develop this year.10:13: You’d think we’d have addressed communications silos by now, but AI has ripped the bandaids off. There are resources out there. I recommend Databricks’ AI Security Framework (DASF); it’s mapped to the MITRE ATLAS. Also be familiar with the NIST Risk Framework and the OWASP AI Exchange.11:40: This knowledge gap is on both sides. What are some of the best practices for addressing this two-sided knowledge gap?12:20: Be honest about where your company stands. Where are we right now? Are we doing a good job of governance? Am I doing a good enough job as a leader? Is there something I don’t know about the environment? Be the leader who’s a bridge, breaks down silos, knows who owns what, and who’s responsible for what.13:24: One issue is the notion of shadow AI. Knowledge workers go home and use things that aren’t sanctioned by companies. Are there specific things that companies should be doing about shadow AI?

  29. 13

    Tom Smoker on Getting Started with GraphRAG

    Join Ben Lorica and Tom Smoker for a discussion of GraphRAG, one of the hottest topics of the last few months. GraphRAG goes a step beyond RAG to make the output of language models more consistent, accurate, and explainable. But what is a graph? A graph is a way of structuring data. In the end, it’s the structure that’s important, along with the work you do to create that structure.Points of Interest0:15: GraphRAG is RAG with a knowledge graph. Do you have a more strict definition?1:00: A lot of what I do is the R in RAG: retrieve. Retrieval is better if you have structured data. I’ve yet to find a definition for GraphRAG. You want to bring in structured data.2:03: At the end of the day, the lesson is structure. Sometimes structure is a SQL database. Don’t lose hope if you don’t have a knowledge graph.2:49: A knowledge graph is a knowledge base and a list of axioms (rules). The knowledge base is just a word connected to another word through a third word. Fundamentally, the benefit comes from the list of triples. The value is in having extracted and defined those triples.4:01: Knowledge graphs are cool again. What are your two favorite examples of GraphRag in production?4:57: My examples are people who are structuring their data so that it’s consistent. Then you can bring it into a context window and do something with it.5:18: LinkedIn and Pinterest are the best examples of existing graph structures that work.5:35: A new application is a veterinary radiology example. Without GraphRAG, the LLM kept recommending conditions specific to Labradors not bulldogs. GraphRAG controlled the problem.6:37: The underlying data was almost exclusively text. It’s difficult to build up a consistent dataset for veterinary radiology because animals move.7:12: My favorite examples: Google uses their data commons to build a Q&A application. Metaphor Data: The starting point is structured data, then they create a second graph from the first graph that maps technical terms to business terms. Then they construct a social graph based on who is using the data.9:41: Structured data can be the basis for a graph.10:06: Unstructured data is valuable, but you need a way to navigate and categorize unstructured data.11:04: Where are we on GraphRAG? Do you still have to explain what GraphRAG is?11:28: More people know about it, but I have to explain it more than I did previously. Exactly what are we referring to? Most people want accuracy in the beginning; the value is often that it is more explainable. People may have seen a fantastic example, but what they haven’t seen is the iterative process in schema design. The upfront cost of these systems is nontrivial.13:13: What are the key bottlenecks? How do I get a knowledge graph?13:23: The biggest question is: Do you need a graph in the first place? There’s a whole spectrum. It’s in most people's interest to stop before they get to the end.14:01: For people who come to us brand-new, we say, “You should try vector RAG first. If that doesn’t work, there’s a lot of good that structuring data can provide.”15:01: If the chunks are structured, and a lot of the work is done up front, then it’s possible to navigate through structured information. At that point, you get value out of vector RAG. Academic papers have to follow a certain structure. If you spend time making sure you know what the chunks are, where they’re split and why, and they’re labeled, you can get a lot of value.16:43: What are some of your pointers about how to get started?16:47: The knowledge base is often a compressed representation. That means less tokens. That means better rate limits and less cost. So some people want a graph to help scale. That’s one start. Another is the desire for a system to be explainable. Getting that information into a structured representation and tracing back that structured representation can be very useful.

  30. 12

    Robert Nishihara on AI and the Future of Data

    Robert Nishihara is one of the creators of Ray and cofounder of Anyscale, a platform for high-performance distributed data analysis and artificial intelligence. Ben Lorica and Robert discuss the need for data for the next generation of AI, which will be multimodal. What kinds of data will we need to develop models for video and multimodal data? And what kinds of tools will we use to prepare that data?Points of Interest1:06: Are we running out of data?1:35: There is a paradigm shift in how ML is thinking about AI. The innovation is on the data side: finding data, evaluating sources of data, curating data, creating synthetic data, filtering low-quality data. People are curating and processing data using AI. Filtering out low-quality data or unimportant image data is an AI task.5:02: A lot of the tools were aimed at warehouses and lakehouses. Now we increasingly have more unstructured multimodal data. What's the challenge for tooling?5:44: Lots of companies have lots of data. They get value out of data by running SQL queries on structured data, but structured data is limited. The real insight is in unstructured data, which will be analyzed using AI. Data will shift from SQL-centric to AI-centric. And tooling for multimodal data processing is almost nonexistent.8:23: In part of the pipeline, you might be able to use CPUs instead of GPUs.8:44: Data processing is not just running inference with an LLM. You might want to decompress video, re-encode video, find scene changes, transcribe, or classify. Some stages will be GPU bound, some will be memory bound, some will be CPU bound. You will want to be able to aggregate these different resources.10:03: Most likely, with this kind of data, it's assumed you will have to go distributed and scale out. There is no choice but to scale the computation.10:46: In the past, we were only using structured data. Now we have multimodal data. We are only scratching the surface of what we can do with video—so people weren't collecting it as much. We will now collect more data.11:41: We need to enable training on 100 times more data.12:43: ML infrastructure teams are now on the critical path.13:52: Companies at the cutting edge have been doing this, but nearly every company has its own data about its specific business that they can use to improve their platform. The value is there. The challenge is the tooling and the infrastructure.15:15: There's another interesting angle around data and scale: experimentation. You will have to run experiments. Data processing and experimentation is part of experimentation.16:18: Customization isn't just at the level of the model. There are decisions to be made at every stage of the pipeline. What to collect, how to chunk, how to embed, how to do retrieval, what model to use, what data to use to fine tune—there are so many decisions to make. To iterate quickly, you need to try different choices and evaluate how they work. Companies should overinvest in evals early.17:29: If you don't have the right foundation, these experiments will be impossible.18:23: What's the next data type to get popular?18:42: Image data will be ubiquitous. People will do a lot with PDFs. Video will be the most challenging. Video combines images and audio; text can be in video too. But the data size is enormous. There are modeling challenges around video understanding. There's so much information in video that isn't being mined.22:50: Companies aren't saying that scaling laws are over, but scaling is slowing down. What's happening?

  31. 11

    Getting Ahead of the Curve with Claire Vo

    In this episode, Ben Lorica talks with Claire Vo, chief product officer at Launch Darkly and founder of ChatPRD. AI gives us a new set of tools that make everyone more productive and efficient. Those tools will allow more experimentation; they will allow more people to participate in product development; and they will create new opportunities for startups. As Claire says, this new tooling lets everyone get more ambitious—and if you start now, you’re on the leading edge. Lean in to the opportunities.Points of Interest0:25: ChatPRD is an AI copilot for product managers and people who build products. The goal is to make more efficient people who need to generate ideas, build our requirements.1:15: It improves the quality of product work: it’s an on-demand coach or colleague.2:05: In a hybrid world, there needs to be some kind of artifact describing what we want to build. No matter the culture, you should try to make high-quality documents to improve the thinking.3:44: We ingest your product documents for two reasons: to have context of what you’ve built, what matter, and to inform style and quality.5:13: To become a 100x PM you need to embrace tools and accelerate your work. It’s learning how to scale and do your best in a highly efficient way. Getting 2–3 days back in your week.7:17: Will the programming language of the future be natural language? You will still have to think and describe things as a software engineer or a product manager.7:54: My favorite users are engineers who don’t have product managers, sales people who get customer requests, and even founders who can’t afford a product manager.8:41: In frontier models, I’d like to see up-to-date training data. The killer feature is performance. The models need to support a workflow that requires speed. Models need more control over output mechanisms than they have now, so users don’t have to massage output.10:38: There isn’t capability parity between the models, so you have to make trade-offs between performance, features, API support, latency, user experience, and streaming.11:05: Always design your application to be model agnostic. LaunchDarkly allows engineers to decouple the configuration and release of their code from deploying in production.12:14: With AI, prompts become feature flags. You can measure things like latency and token count, and make informed decisions about what works best.13:21: It’s important to have the ability to experiment in classic software development. That matters even more with nondeterministic software, because the ability to predict output goes down. You need to think about instrumentation from the beginning.14:37: I have been through a couple of technology waves, but this one has stopped me in my tracks. The difference between what is possible and what is not possible is unbelievable. I could have built the product from my startup 10 years ago before lunchtime.16:01: People need to prepare to be expected to do more because the ability to do more is powered by these tools and automations. People should educate themselves on how to automate tasks in their current job, and they should add additional skills like the ability to code.16:42: The shape of organizations will change. The triad of the product manager, engineering lead, and design lead will collapse into an individual. Individual contributors will become more efficient.17:35: Everyone can get more ambitious. There won’t be less to do. More people will be empowered to do more things and have bigger impact.18:44: Everything requires a radical cultural shift inside companies. It can feel scary. You need to set the aspiration and why it matters; you need to organize among motivated individuals and reward the behavior you want to see; new organizations will fall out of the centers of gravity around people who are operating in an AI-native way.

  32. 10

    The Future of Programming with Matt Welsh

    Join us for a conversation between Ben Lorica and Matt Welsh, cofounder of Fixie.ai, former engineer at Apple and Google, and one of Mark Zuckerberg’s professors at Harvard. Learn how AI is changing computing. Whether it’s in C or a human language, programming is telling a computer what you want it to do—but AI opens up new classes of things that we can ask it to do.It’s not just simplifying (or replacing) coding; it’s creating new opportunities and new kinds of applications that we couldn’t imagine two or three years ago.Points of Interest0:00: Introduction.2:38: The changing nature of programming. What will replace programming?3:07: Ultimately, the idea of writing a program will be replaced by telling a language model what you want to do. The language model will do what you want directly.5:03: I can do things I couldn’t imagine doing—for example, summarize a transcript or find bios of speakers and relevant papers.7:01: There’s a whole new field of kinds of computation we couldn’t do before.7:48: People in fields like medicine used to have to ask computer scientists to do things for them. Now, you don’t have to get a computer scientist to translate an idea into reality.11:30: What is missing from the current tooling?11:40: It’s way too hard for people without programming ability to integrate language models into their workflows. Ultimately, AI needs to be deeply integrated into products and the OS.13:45: Are people in the UX community inventing new ways to interact?14:40: We are very embedded in a web/mobile-based way of thinking about interacting. AI changes the ways we interact with computers—for example, voice.16:07: There’s a lot of information encoded into voice that you miss when you encode it into text.18:15: What about programming itself?18:30: Programming is changing radically. At Fixie, we mandated that employees have access to ChatGPT and similar tools.20:34: What is the role of testing and QA?21:28: People will struggle to find the right trade-offs. We’re not throwing out all of the processes we’ve developed, like testing and code reviews.25:25: Every company can train AI to scale their best engineers.25:55: We’re being sloppy as an industry. Curation of good code and good documents will be important. We don’t just need more data, we need better data.28:23: What is Aryn doing?29:17: When people wanted to use AI models to ask questions about their data, they started with simple processes: break text into chunks, store in vector database, and at question time, feed them back in to the prompt.30:10: We need the ability to extract data from unstructured documents. The structure is there, but it’s hidden. The first part of Aryn: How do you extract the structure inherent in documents?32:46: The second part of Aryn: A Python framework, Sycamore, lets you build ETL pipelines from these documents. ETL does things like normalize location information.35:45: Another part of the Aryn stack is LLM-powered unstructured analytics (LUNA) that allows you to make queries based on the unstructured data in the documents.37:34: The future of programming is using language models as computers to perform computation that would be difficult to express in a programming language.38:22: People are talking about GraphRAG, which is RAG with knowledge graphs, but how do you get a knowledge graph? Can Aryn help that?39:15: Yes, we’re effectively doing knowledge graph construction. But once you have the right underlying structure, you may not need knowledge graphs at all.40:50: Are tools for evaluating AI lagging behind development tools?41:16: The meaning of “evaluation” is often not well-defined.43:03: Evaluation will come down to establishing trust.43:32: We need tools that will allow people to collaborate early on evaluations. You need to give people that help them understand what’s happening.

  33. 9

    Kingsley Ndoh on Improving Cancer Care with AI

    What can AI do to improve healthcare? Kingsley Ndoh, founder of Hurone AI, talks with Ben Lorica about how Hurone is making cancer care more effective for people who are underserved by the medical system. He discusses how AI can streamline the medical process, both helping doctors to treat patients more effectively and making clinical trials more diverse.Points of Interest0:36: What motivated you to apply AI to cancer care? What problems are you trying to solve?1:39: We need environments for training AI models that are effective for all populations.2:31: Current oncology solutions serve advanced healthcare systems, leaving community oncology centers and international markets underserved.3:31: Lack of diversity in clinical trials means we don’t have full evidence on the efficacy of drugs.5:00: What is an oncologist?6:10: Cancer is a very complex disease; every cancer is different and has its own solutions.6:43: What advantages do you bring as a domain expert?7:11: I’ve been a physician taking care of patients. I understand clinical workflows in Nigeria and the US. I’ve also been an entrepreneur since I was in high school. I’ve also worked in the global oncology space with governments and pharma companies. That network is very important.9:15: What was the situation before Gukiza [Hurone’s app]? What does Gukiza enable today?9:44: Gukiza makes care more accessible to patients and optimizes workflows for oncologists. They may have to travel long distances to see an oncologist; they may have side effects or even emergencies that are avoidable; data about events may be lost.12:53: Gukiza streamlines the process; it’s a two-way system that can be used standalone. There is a HIPPA-compliant API that can be integrated into major electronic medical records systems. Patients aren’t limited to an app; there is an API for WhatsApp, Telegram, and text messaging.14:13: Patients can describe their problems. Clinicians can click a button and generate a response that they can review and send to the patient. Clinicians can also call patients, do clinical summaries, and see how patients are progressing.17:08: One should think about this as a copilot. The app makes suggestions; the physician makes the decision.17:35: There are definitely risks. We are building our model and fine-tuning it to ensure that hallucination is limited. But there is still a final human review.18:40: What if I want to use the system in a completely new country? What does it take to get the system into a viable, usable state?19:41: We conform to the country’s guidelines for the management of patients. Cancer care is usually based on established guidelines. In the US, we have NCCN guidelines. To make sure guidelines are responsive to different regions, the NCCN looked at evidence for research done in different countries to harmonize guidelines. That gave birth to the resource stratified guidelines for regions like Sub-Saharan Africa. We don’t need to customize a lot.21:38: We are also building agreements for access to de-identified cancer data. As we scale, it will get better.24:02: Health data is the most sensitive data in the world, but also the most abundant. Compared to other industries, healthcare is lagging behind. But many regions are looking for disruption and innovation and are willing to be flexible to work with us.25:20: Our solution isn’t a magic bullet, but it will shift the needle.26:12: We are excited about LLMs with text and images. But before LLMs, people were excited about computer vision. What models are you using?27:10: We’re relying on LLMs and NLPs. There are established startups with computer vision for radiology and pathology; we are partnering with those companies. The major data we collect is genomic data. We are also incorporating wearable device data with things like geolocation, sleep patterns, heart rates, etc.28:28: Social determinants of health data are also important: ZIP code, employment status, activities, food.

  34. 8

    Putting AI in the Hands of Farmers with Rikin Gandhi

    Rikin Gandhi, CTO of Digital Green, talks with Ben Lorica about using generative AI to help farmers in developing countries become more productive. Farmer.Chat integrates information from training videos, sources of weather and crop information, and other data sources in a multimodal app that farmers can use in real-time.Points of Interest0:45: Digital Green helps farmers become more productive. Two years ago, Digital Green developed Farmer.Chat, an app that uses generative AI to put local language training videos together with weather data, market information, and other data.2:09: Our primary data source is our library of 10,000 videos in 40 languages that have been produced by farmers. We integrate additional sources for weather and market information. More recently, we’ve added information support tools.3:38: We have a smartphone app. Users who only have feature phones can call into a number and interact with a bot.5:00: Prior to Farmer.Chat, our work was primarily offline: videos shown on mobile projectors to an in-person audience. Sending content to phones flips the paradigm: rather than attending a video, farmers can ask questions relevant to their situation.6:40: When did you realize that generative AI opened up new possibilities? It was a gradual transition from offline videos on projectors. COVID didn’t allow us to get groups of farmers together. And more farmers came online in the same period.8:17: We had a deterministic bot before Farmer.Chat. But users had to traverse a tree to get the information they wanted. That tree was challenging to create and difficult to use.9:33: With GPT-3, we saw that we could move away from complexity and cost of using a deterministic bot.11:15: Did ChatGPT alert you to more possibilities? ChatGPT has scoured open internet knowledge. Farmers are looking for location and time-specific information. Even in the earliest version of ChatGPT, we saw that it had a lot of this information. Putting this world together with our video was powerful.13:07: Accuracy, precision, and recall are all important. Are you fine-tuning and using RAG to make sure you are accurate? We had problems with hallucinations even within our knowledge base. We implemented reranking and filtering, which reduced hallucinations to <1%. We’ve created a golden Q&A set.16:01: People are now talking about GraphRAG, the use of knowledge graphs for RAG. Can you create a knowledge graph because you know your data so well? A lot of concepts in agriculture are related—for example, crop calendars for how crops develop. We’re trying to build those relations into the system.17:05: We are leveraging agentic orchestration for the overall pipeline. Based on the user’s query, we may be able to answer questions directly rather than go through the RAG pipeline.18:44: Your situation is inherently multimodal: video, speech-to-text, voice; is this a challenge? We’re now using tools like GPT Vision to get descriptive metadata about what’s in videos. It becomes part of the database. We began with text queries; we added voice support. And now people can take a photo of a crop or an animal.21:04: Foundation models are becoming multimodal. What’s your user interface today? What are you moving towards? We started with messaging apps that the users already use. We’re plugging the bot into that ecosystem. We’re migrating towards a reality that isn’t text first: putting video first so farmers can speak and take a video. For many farmers, this is the first time they’ve interacted with a bot. Autoprompts are important so they know that it has weather and locale-specific information.23:57: What are specific challenges around AI—privacy, security, and ethics? Agriculture is often a sensitive subject. There’s a lot of personally identifiable information. We try to mask that information so it’s not used to train models. Farmers need to be able to trust that their information won’t be taken away from them.

  35. 7

    Adopting AI in the Enterprise with Timothy Persons

    Timothy Persons of PricewaterhouseCoopers (PwC) talks with Ben Lorica about adoption of AI in the enterprise. They discuss the challenges enterprises experience, including the need to change corporate culture. To succeed, it’s important to focus on solving well-defined problems rather than just doing something cool with AI. Good data strategies and data governance are essential. Persons also highlights the importance of training and education for everyone in the organization and the need to create safe environments where people can experiment.Points of Interest0:00: Introduction.1:00: We are seeing an uptick in adoption of AI in the enterprise. CEOs are planning to adopt AI and pursue business reinvention. Many companies are still kicking the tires. There is more adoption in the backend where risks are lower.3:36: AI budgets are on an upward trend. It is not a small spend and there’s a tendency to underestimate cost.4:54: What are some of the key challenges that enterprises face when they go to deployment?5:10: It’s all about trust and culture: getting employees and executives comfortable with the technology. That implies upskilling and internal conversations.7:09: What is a data strategy for generative AI?7:37: Companies need data governance, which must be more than a well-written policy document.Governance means operationalizing the policy. Once you focus on quality data and abide by governance, you have the foundation for a good future.9:26: How do you measure that you’re delivering ROI? How do you evaluate so that you know your LLM-backed application is ready to go?10:50: ROI—We need to separate R&D. For R, ROI doesn’t work well. But when you cross from R to D and investments scale, you need to think about ROI.12:15: Evaluation—We can measure LLMs today. But what does that mean in the context of the problem you’re solving? AI in autonomous vehicles is different from AI in medical systems.13:58: Companies need to invest in educating the workforce. Upskilling is not just for expertise; it is also for interdisciplinarity. Changing organizational culture means changing the way organizations communicate and partner.15:38: People underestimate the importance of creating a good user experience. Design thinking is needed. Focus on end-user experience and work back from that.16:59: What are some of the most common use cases for AI?17:17: In the back office, you often have a corpus of information customized to your situation. You can build fit-for-purpose chatbots for key support functions. The best lawyers can’t read everything possible in the corpus or keep up with all the regulatory changes coming in.21:11: AI will increase the value of labor investments. It will expedite the L&D curve for new employees. It will improve users’ lives. And AI is getting much better. We’ve only seen the floor, not the ceiling.24:38: Do you have a checklist or a playbook to help companies prioritize use cases?24:57: Companies need to think “What problems do I need to solve?” Think from a problem-centric approach.27:32 Are there best practices for sharing learning across different groups?28:17: We’ve seen centers of excellences rise. Sharing what didn’t work is important. GenAI is very democratizing—not everyone needs a PhD. When companies reward sharing, including what didn’t work, it really engenders collective learning and great ideas.30:15: What have leading companies done to prepare their workforces?30:31: PwC made a major investment in MyAI, which was focused on the ability to get AI into the hands of users, down to entry-level interns. It was an intentional L&D process that was focused on AI. We gave people the tools and a safe space to use them.32:43: It’s learning by doing, and it’s fun. And it can be customized to a company or a firm.33:03: If we didn’t provide a controlled environment, our people would go out into an uncontrolled environment.

  36. 6

    Learning How to Do AI Effectively with Alfred Spector

    Alfred Spector has been a leader in AI and machine learning at Google, IBM, and Two Sigma. He is now a visiting scholar at MIT, an advisor at Blackstone, and coauthor of the text book Data Science in Context. Alfred talks with Ben Lorica about what people developing with AI need to be successful. Succeeding with AI is about more than just a model. We need to think about the application and its context. We need humanities and social sciences in addition to technology. Alfred also discusses the AI skills gap, resistance to adopting AI, “hybrid intelligence,” and the calls to regulate AI.Points of Interest0:00: Intro0:54: What do we need to do to apply generative AI effectively?2:10: Why did you end up writing the book Data Science in Context?3:14: Data science is about more than the model. More than "just get some data and hope."8:22: Ethics alone isn't enough.11:08: Students need a good basis in economics, political science, history, and literature. We have to think more broadly than "which ad gets the most clicks."14:20: There's an AI literacy and skills gap, particularly outside Silicon Valley.15:43: Companies be probing opportunities.16:20: Is there resistance to adopting AI? Fear of displacement or distrust?18:18: Most people think there is more to do than people to do the work.19:21: To what extent are companies trying to come up with an overarching vision for AI?19:51: For some companies, GenAI will be formative. Others need to kick the tires and put together a road map.21:35: Internal applications can be more fault tolerant. Keep employees in the loop; don't be lazy.23:12: Prior to ChatGPT, barrier to entry was higher. AI is now very developer friendly.24:13: What level of data science or ML knowledge should companies have?25:01: There are two categories of expertise; broad perspective on products and services.28:25: It may take a long time to evaluate whether an application can be deployed.29:07: With agents, the stakes are higher.30:07: Hybrid intelligence will be a coalition that includes AI.32:38: Even task-specific agents can break. Agents are fragile. Humans aren't fast but are good at dealing with things we haven't encountered before.33:43: Regulate uses of technology, not technologies.

  37. 5

    Andrew Ng on where AI is headed. It’s about agents.

    Andrew Ng is one of the pioneers of modern AI. He was Google Brain’s founding technical lead, Coursera’s founder, Baidu’s Chief Scientist, DeepLearning.ai’s founder, a Professor at Stanford—and much more. Andrew talks with Ben Lorica about scaling AI, agents, the future of open source AI, and openness among AI researchers. Have you experienced an “agentic moment” when you’re surprised and thrilled by AI’s ability to generate a plan and then to enact that plan? You will.Points of interest0:00: Introduction1:00: Advancing AI required scaling up. Better algorithms weren’t the issue.2:57: Just as we needed GPUs and other new hardware for training, we may need new hardware for inference.3:18: People are pushing Data-centric AI forward. Engineering the data is important—maybe even more important than engineering the model.4:41: The idea of agents has been around for a while. What’s new here?6:00: Agentic workflows let AI work iteratively, which yields a huge improvement in performance.8:01: Agent can be used for Robotic Process Automation (RPA), but it’s much bigger than that. We will experience “agentic moments” when we see AI that plans and executes a task without human intervention.10:42: Do you anticipate new Agentic applications that weren’t possible before?12:21: What are the risks of training on copyright-free datasets? Will using copyright-free datasets degrade performance?15:05: AI is a tool; I dispatch it to do things for me. I don’t see it as a different “species.”16:17: How do we know when an application is ready to release? What are best practices for enterprise use?17:18: It’s still very early. We need more work on evaluation. It’s easy to build applications—but when you build an app in a week, it’s hard to spend 10 weeks evaluating it.19:14: A lot of people build an application on one LLM, but won’t switch because evaluation is hard.20:12: Are you concerned that Meta is the only consistent supplier of open source language models?22:10: The cost of training is falling. The decrease in the cost of training means that the ability to train large models will become open to more players.26:15: The AI community seems less open than it was, and more dominated by commercial interests. Is it possible that the next big innovation won’t get published?26:50: We’re starting to see papers about alternatives to transformers. It’s very difficult to keep technical ideas secret for a long time.

  38. 4

    Democratizing AI with Gwendolyn Stripling

    Gwendolyn Stripling, author of Low-Code AI, talks about the democratization of AI, the primacy of data, the future of data science, and the coming of agents. It’s easy to think that AI is all about algorithms and models but it’s not; it’s really about understanding the business use case and the data that can be applied to that use case. We’re only beginning to have tools for the rest of the job: collecting, preparing, and exploring the data to find out what’s relevant to your business. Looking ahead, Gwendolyn sees generative AI automating even more of the workload. But focusing on the data, and collecting, understanding, and interpreting it, will always be the human part of the job.Points of interest0:57: What’s the boundary between no-code and low-code?3:10: Using the minimum amount of code necessary to achieve your goal.4:09: Low-code reduces the heavy lifting. But what if you want to learn about AI and ML?6:35: Learning ML isn’t about the tools; it’s about the business case and the data.7:55: What made you think about exposing more people to low-code AI?11:21: The key to all of this is the use case and then the data.14:32: What if I primarily use SQL?15:30: Is there an equivalent of AutoML for data collection and preparation?16:50: Generative AI looks like it will be able to help prepare data.19:22: How did the release of ChatGPT and other LLMs affect your book?24:00: Is there a low-code or no-code approach to RAG?26:30: The GenAI pipeline is becoming completely automated.26:49: The word of 2024 is agents. A lot of what can be automated will be automated.28:00: A lot of people are sharing lessons and best practices. That makes this an exciting time.29:17: Looking ahead five years, what will data scientists and ML Engineers do?

  39. 3

    Competing in a Generative World with Justin Norman

    Justin Norman, author of Product Management for AI and co-founder of Vera, a startup focused on security for generative AI, talks with Ben Lorica about how product management has changed since Generative AI came on the scene. He discusses the issues retrieval-augmented generation (RAG) raises for product management; how reliability has become part of a product’s value; how companies that have lagged in their adoption of AI can use generative AI as a way to catch up; and the ability of open source AI in helping smaller companies compete with more established companies.Points of Interest0:00: You wrote Product Management for AI back in 2020 and 2021. How have things changed for product managers since then?3:04: Do companies that lead with operations and infrastructure for traditional AI maintain an advantage with Generative AI? Or does Generative AI allow companies that are just starting to catch up?5:09: Can new companies use open source to compete with established companies? Can open source help capture value as well as larger proprietary models?6:08: What do product managers struggle with when implementing RAG? What's the relationship between fine-tuning and RAG?10:58: RAG gives you value out of the box, but the key to success is how the data is organized.13:57: Are VCs underinvesting in certain parts of the pipeline? There is lots of investment in AI, but not as much investment in startups working on necessary technologies like ETL and data engineering.16:31: Why is reliability important for generative AI? How is generative AI different from other applications that we’re familiar with, and what implications does this have for product management?21:03: Are enterprises realizing that efficiency is important for succeeding with generative AI?23:44: We’re familiar with dashboards for monitoring and managing traditional software products. What would you imagine a dashboard for generative AI models to be? What do you need to be monitoring?28:49: Very few developers working in machine learning have also done frontend development or worked on user experience (UX). However, understanding user interaction can help you to improve your model.30:44: You're working with the father of digital forensics, Hany Farid. Should we be worried about DeepFakes?

  40. 2

    Pete Warden on Running AI on Small Systems

    Pete Warden, founder of Useful Sensors and co-author of TinyML, discusses use cases for artificial intelligence that we rarely think about: how can you run AI on very small systems? How can you put AI on consumer devices in ways that are actually useful and not just buzzword-compliant? AI doesn’t have to rely on massive GPU farms. Pete talks about what happens when you exchange one set of requirements (extreme power, heat, and expense) for another (minimal size, cost, and heat).Points of Interest00:00: Introductions, including Pete’s introduction to his company.2:22: What are some of the challenges and use cases for sensor-driven AI?4:11: Is sensor-driven AI relevant to industries other than hardware?6:22: Now we’re in the age of foundation models and large language models. Is “large” incompatible with “tiny”? Can you run language models on smaller devices?8:00: Will there be developments in tinyML that will benefit the broader LLM community?9:30: What’s deployable today in computer vision, speech, and language? What can be done with hardware that’s constrained by cost, size, and power consumption?11:15: How will product designers work with sensor-driven AI? Will they simply select from a palette of optional modules?12:37: Pete walks us through the development of AI-in-a-Box, from its conception to its reception.15:31: Your devices don’t have network connections. Without a network connection, how do you update models? Is it necessary?19:00: Do you do Retrieval Augmented Generation (RAG) on your devices?20:35: Our devices have user interfaces that combine voice and presence. A voice interface is central, but visual (and other channels) help to create an awareness of the speaker.21:35: What are some of your specific challenges, like power consumption and latency? How do you make tradeoffs?22:45: What is the future of large language models for sensor-driven AI?26:50: What are some of the security concerns for sensor-driven AI and what are you doing about them?28:22: What is Dark Compute and why is it important?30:48: What are the biggest opportunities for pushing AI into consumer devices? We need to start with problems that users actually care about.32:30: How can listeners connect to the broader movement around TinyML?

  41. 1

    Chip Huyen on Finding Business Use Cases for Generative AI

    O’Reilly’s Generative AI in the Enterprise survey reported that people have trouble coming up with appropriate enterprise use cases for AI. Why is it hard to come up with appropriate use cases?Chip Huyen, cofounder of Claypot AI and author of Designing Machine Learning Systems, talks about why many companies have trouble coming up with appropriate use cases for AI, how to evaluate possible use cases, and the skills your company will need to put them into practice.Points of Interest0:00: Introduction0:49: O’Reilly’s Generative AI in the Enterprise survey report results.3:02: Now that generative AI is more accessible, will it be easier to come up with use cases?4:29: AI is easy to demo but hard to productize. Consistence, risk, and compliance.6:44: Is there a framework or checklist for thinking about applications?8:15: What are some of your favorite use cases?13:30: RAG is the “hello, world” of AI applications.17:24: How do you navigate between the desires and requirements of different stakeholders?19:00: When talking to stakeholders, you have to answer questions at the right level.21:10: How to think about staffing teams for generative AI.22:45: There’s less model development with generative AI, more application development.23:12: Frontend engineers and full-stack developers are very successful.26:27: What are companies’ concerns about risk?27:27: Understanding data gives a lot of clues about what it is good at and should be used for.29:00: The importance of documentation.30:25: Are there specific things you can do to ease the integration of AI into an organization?32:49: What companies that have deployed AI into products stand out?

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

In 2023, ChatGPT put AI on everyone’s agenda. Now, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

HOSTED BY

O'Reilly

CATEGORIES

URL copied to clipboard!