AI Engineering Podcast Podcast - All Episodes

79

Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty

Summary In this episode of the AI Engineering Podcast, Steven Watt, leader of the Office of the CTO at Red Hat, discusses practical paths to achieving AI sovereignty for organizations. He shares his two-decade experience in AI, highlighting how governments are building GPU platforms and protected data hubs to maintain control over AI workloads. Steve emphasizes why self-managed infrastructure is becoming a strategic necessity as companies outgrow cloud costs and require tighter control over models, data, and compliance. The conversation explores the operational substrate for AI sovereignty, including Kubernetes as the scale-out backbone for LLM serving, bridging the gap with PyTorch ecosystems, observability and policy for non-deterministic systems, and emerging security needs such as confidential inference and agentic identity. They also discuss model and hardware optionality (GPUs, CPUs, and new accelerators), the growing demand for energy-efficient inference, and the importance of open models and post-training to create durable differentiation. Steve identifies access to GPUs as the biggest gap hindering sovereign AI adoption today, emphasizing the need for broad access to GPUs for AI workloads to thrive. The conversation also touches on evolving architectures beyond transformers, the interplay between AI and data sovereignty, consolidation pressures from pilot chaos to standardized platforms, and the societal triad of universities, startups, and sovereign infrastructure. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Stephen Watt about how to adapt your existing infrastructure investments to support your AI workloads and gain "AI Sovereignty"Interview IntroductionHow did you get involved in machine learning?Can you describe what you mean by the term "AI sovereignty"?What are the motivating factors for investing in that as an organizational capability?What do you see as the scale, sophistication, regulatory triggers that tip someone from buying off-the-shelf AI services and into operating their own AI stacks?There has been substantial investment in MLOps toolchains and patterns over the past decade, along with corresponding evolution of LLMOps techniques. What do you see as the areas of overlap between those technology patterns and the "traditional" infrastructure capabilities that organizations have matured over the past ~20 years?What are the aspects that are disjoint and contribute to operational pain for DevOps/platform teams?How do AI/agentic workloads strain the ability of existing security and governance frameworks that teams are operating for existing cloud-native workloads?What are the options for extending those frameworks and what are the requirements that force a new approach? (e.g. guardrails, LLM interpretability, etc.)What are the elements of cloud-native architecture that have left us (as an industry) well situated to absorb the complexity of AI/agentic workloads?How does the complexity shift as you go along the continuum of model training to finetuning to inference?Beyond the ability to host and execute inference on a model are the various data stores and tool availability that make generative AI a competitive advantage. How much of that (e.g. agentic memory, vector stores, MCP/A2A tools, etc.) are actually net new vs. a new coat of paint on existing techniques?What are the most interesting, innovative, or unexpected ways that you have seen teams operationalizing AI workloads on their infrastructure?What are the most interesting, unexpected, or challenging lessons that you have learned while working on empowering organizations to achieve AI sovereignty?When is operating your own AI infrastructure the wrong choice?What are your predictions for the future evolution of operational substrates for AI workloads?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links RedHatBayesian ClassifierHadoopHBaseDeepSeekReflection AINvidia BlackwellvLLMIBM SpyrevLLM CPUIBM Watson on JeopardyNeuromorphic ComputingKubernetesPyTorch FoundationMLOpsLLMOpsSemantic RouterBERTAGNTCYOPA == Open Policy AgentCEDARWSDL == Web Services Description LanguageUDDISparkllama.cppOllamaARPA-HThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 25, 2026

1h 01m

78

From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit

Summary In this episode of the AI Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational foundations required to run LLM-powered applications in production. He highlights common early blind spots teams face, including opaque model behavior, runaway token costs, and brittle prompt management, emphasizing that strong observability and cost tracking must be established before an MVP ships. Aman explains how OpenLit leverages OpenTelemetry for vendor-neutral tracing across models, tools, and data stores, and introduces features such as prompt and secret management with versioning, evaluation workflows (including LLM-as-a-judge), and fleet management for OpenTelemetry collectors. The conversation covers experimentation patterns, strategies to avoid vendor lock-in, and how detailed stepwise traces reshape system design and debugging. Aman also shares recent advancements like a Kubernetes operator for zero-code instrumentation, multi-database configurations for environment isolation, and integrations with platforms such as Grafana and Dash0. They conclude by discussing lessons learned from building in the open, prioritizing reliability, developer experience, and data security, and preview future work on context management and closing the loop from experimentation to prompt/dataset improvements. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Aman Agarwal about the operational investments that are necessary to ensure you get the most out of your AI modelsInterview IntroductionHow did you get involved in the area of AI/data management?Can you start by giving your assessment of the main blind spots that are common in the existing AI application patterns?As teams adopt agentic architectures, how common is it to fall prey to those same blind spots?There are numerous tools/services available now focused on various elements of "LLMOps". What are the major components necessary for a minimum viable operational platform for LLMs?There are several areas of overlap, as well as disjoint features, in the ecosystem of tools (both open source and commercial). How do you advise teams to navigate the selection process? (point solutions vs. integrated tools, and handling frameworks with only partial overlap)Can you describe what OpenLit is and the story behind it?How would you characterize the feature set and focus of OpenLit compared to what you view as the "major players"?Once you have invested in a platform like OpenLit, how does that change the overall development workflow for the lifecycle of AI/agentic applications?What are the most complex/challenging elements of change management for LLM-powered systems? (e.g. prompt tuning, model changes, data changes, etc.)How can the information collected in OpenLit be used to develop a self-improvement flywheel for agentic systems?Can you describe the architecture and implementation of OpenLit?How have the scope and goals of the project changed since you started working on it?Given the foundational aspects of the project that you have built, what are some of the adjacent capabilities that OpenLit is situated to expand into?What are the sharp edges and blind spots that are still challenging even when you have OpenLit or similar integrated?What are the most interesting, innovative, or unexpected ways that you have seen OpenLit used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenLit?When is OpenLit the wrong choice?What do you have planned for the future of OpenLit?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data/AI management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links OpenLitFleet HubOpenTelemetryLangFuseLangSmithTensorZeroAI Engineering Podcast EpisodeTraceloopHeliconeClickhouseThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 15, 2026

50m

77

Taming Voice Complexity with Dynamic Ensembles at Modulate

Summary In this episode of the AI Engineering Podcast, Carter Huffman, co-founder and CTO of Modulate, discusses the engineering behind low-latency, high-accuracy Voice AI. He explains why voice is a uniquely challenging modality due to its rich non-textual signals like tone, emotion, and context, and how simple speech-to-text-to-speech pipelines can't capture the necessary nuance. Carter introduces Modulate's Ensemble Listening Model (ELM) architecture, which uses dynamic routing and cost-based optimization to achieve scalability and precision in various audio environments. He covera topics such as reliability under distributed systems constraints, watchdogging with periodic model checks, structured long-horizon memory for conversations, and the trade-offs that make ensemble approaches compelling for repeated tasks at scale. Carter also shares insights on how ELMs generalize beyond voice, draws parallels to database query planners and mixture-of-experts, and discusses strategies for observability and evaluation in complex processing pipelines. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Carter Huffman about his work building an ensemble approach to low latency voice AIInterview IntroductionHow did you get involved in machine learning?Can you describe the "Ensemble Listening" approach and the story behind why Modulate moved away from monolithic architectures?When designing a real-time voice system, how do you handle the routing logic between specialized models without blowing your latency budget?What does the "gatekeeper" or routing layer actually look like in code?You’ve mentioned "evals that don’t lie." How do you build a validation pipeline for noisy, adversarial voice data that catches regressions that a simple word-error-rate (WER) might miss?In an ensemble of models, a failure in one specialized node might not crash the system, but it can degrade the output quality. How do you monitor for these "silent failures" in real-time without introducing massive overhead?For many teams, the default is to call an API for a frontier model. At what point in the scaling or latency curve does it become technically (or economically) necessary to swap a general LLM for a suite of specialized, smaller models?How do you track the real-world costs associated with the technical and human overhead of this more complex system?What are the most interesting, innovative, or unexpected ways that you have seen orchestrated ensembles used in live conversation environments?What are the most interesting, unexpected, or challenging lessons that you have learned while managing the lifecycle of multiple specialized models simultaneously?When is an ensemble approach the wrong choice? (e.g., At what level of complexity or throughput is the overhead of orchestration more trouble than it’s worth?)What do you have planned for the future of Ensemble Listening Models?Are we looking at self-optimizing routers, or perhaps moving these ensembles closer to the edge?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links ModulateNasa Jet Propulsion LaboratoryOpenAI WhisperMulti-Armed BanditCost-Based OptimizerGPT 5LLM AttentionTransformer ArchitectureMixture of ExpertsDilated ConvolutionWavenetThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 8, 2026

59m

76

GPU Clouds, Aggregators, and the New Economics of AI Compute

Summary In this episode I sit down with Hugo Shi, co-founder and CTO of Saturn Cloud, to map the strategic realities of sourcing and operating GPUs across clouds. Hugo breaks down today’s provider landscape—from hyperscalers to full-service GPU clouds, bare metal/concierge providers, and emerging GPU aggregators—and how to choose among them based on security posture, managed services, and cost. We explore practical layers of capability (compute, orchestration with Kubernetes/Slurm, storage, networking, and managed services), the trade-offs of portability on “Kubernetes-native” stacks, and the persistent challenge of data gravity. We also discuss current supply dynamics, the growing availability of on-demand capacity as newer chips roll out, and how AMD’s ecosystem is maturing as real competition to NVIDIA. Hugo shares patterns for separating training and inference across providers, why traditional ML is far from dead, and how usage varies wildly across domains like biotech. We close with predictions on consolidation, full‑stack experiences from GPU clouds, financial-style GPU marketplaces, and much-needed advances in reliability for long-running GPU jobs. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Hugo Shi about the strategic realities of sourcing GPUs in the cloud for your training and inference workloadsInterviewIntroductionHow did you get involved in machine learning?Can you start by giving a summary of your understanding of the current market for "cloud" GPUs?How would you characterize the customer base for the "neocloud" providers?How is the access to the GPU compute typically mediated?The predominant cloud providers (AWS, GCP, Azure) have gained market share by offering numerous differentiated services and ease-of-use features. What are the types of services that you might expect from a GPU provider?The "cloud-native" ecosystem was developed with the promise of enabling workload portability, but the realities are often more complicated. What are some of the difficulties that teams encounter when trying to adapt their workloads to these different cloud providers?What are the toolchains/frameworks/architectures that you are seeing as most effective at adapting to these different compute environments?One of the major themes in the 2010s that worked against multi-cloud strategies was the idea of "data gravity". What are the strategies that teams are using to mitigate that tax on their workloads?That is a more substantial impact when dealing with training workloads than for inference compute. How are you seeing teams think about the balance of cost savings vs. operational complexity for those different workloads?What are the most interesting, innovative, or unexpected ways that you have seen teams capitalize on GPU capacity across these new providers?What are the most interesting, unexpected, or challenging lessons that you have learned while working on enabling teams to execute workloads on these neoclouds?When is a "neocloud" or "GPU cloud" provider the wrong choice?What are your predictions for the future evolutions of GPU-as-a-service as hardware availability improves and model architectures become more efficient?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksSaturn CloudPandasNumPyMatLabAWSGCPAzureOracle CloudRunPodFluidStackSFComputeKubeFlowLightning AIDStackMetaflowFlyteArya AIDagsterCoreweaveVultrNebiusVast.aiWekaVast DataSlurmCNCF == Cloud-Native Computing FoundationKubernetesTerraformECSHelm ChartBlock StorageObject StorageContainer RegistryCrusoeAlluxioData VirtualizationGB300H100Spot InstanceAWS TrainiumGoogle TPU (Tensor Processing Unit)AMDROCMPyTorchGoogle Vertex AIAWS BedrockCUDA PythonMojoXGBoostRandom ForestLudwig - Uber Deep Learning AutoMLPaperspaceVoltage ParkWeights & BiasesThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jan 27, 2026

46m

75

The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI

Summary In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Niklas Gustavsson about how Spotify is scaling AI usage in engineering and product workInterview IntroductionHow did you get involved in machine learning?Can you start by giving an overview of your engineering practices independent of AI?What was your process for introducing AI into the developmer experience? (e.g. pioneers doing early work (bottom-up) vs. top-down)There are countless agentic coding tools on the market now. How do you balance organizational standardization vs. exploration?Beyond the toolchain, what are your methods for sharing best practices and upskilling engineers on use of agentic toolchains for software/product engineering?Spotify has been operationalizing ML/AI features since before the introduction of LLMs and transformer models. How has that history helped inform your adoption of generative AI in your overall engineering organization?As you use these generative and agentic AI utilities in your day-to-day, how have those lessons learned fed back into your AI-powered product features?What are some of the platform capabilities/developer experience investments that you have made to improve the overall effectiveness of agentic coding in your engineering organization?What are some examples of guardrails/speedbumps that you have introduced to avoid injecting unreliable or untested work into production?As the (time/money/cognitive) cost of writing code drops that increases the burden on reviewing that code. What are some of the ways that you are working to scale that side of the equation?What are some of the ways that agentic coding/CLI utilities have bled into other areas of engineering/opertions/product development beyond just writing code?What are the most interesting, innovative, or unexpected ways that you have seen your team applying AI/agentic engineering practices?What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing and scaling agentic engineering patterns in your teams?When is agentic code generation the wrong choice?What do you have planned for the future of AI and agentic coding patterns and practices in your organization?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links SpotifyDeveloper ExperienceLLM == Large Language ModelTransformersBackStageGitHub CopilotCursorClaude SkillsMonorepoMCP == Model Context ProtocolClaude CodeProduct ManagerDORA MetricsType AnnotationsBigQueryPRD == Product Requirements DocumentAI EvalsLLM-as-a-JudgeAgentic MemoryThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jan 20, 2026

56m

74

Generative AI Meets Accessibility: Benchmarks, Breakthroughs, and Blind Spots with Joe Devon

Summary In this episode Joe Devon, co-founder of Global Accessibility Awareness Day (GAAD), talks about how generative AI can both help and harm digital accessibility — and what it will take to tilt the balance toward inclusion. Joe shares his personal motivation for the work, real-world stakes for disabled users across web, mobile, and developer tooling, and compelling stories that illustrate why accessible design is a human-rights issue as much as a compliance checkbox. He digs into AI’s current and future roles: from improving caption quality and auto-generating audio descriptions to evaluating how well code-gen models produce accessible UI by default. Joe introduces AIMAC (AI Model Accessibility Checker), a new benchmark comparing top models on accessibility-minded code generation, what the results reveal, and how model providers and engineering teams can practically raise the bar with linters, training data, and cultural change. He closes with concrete guidance for leaders, why involving people with disabilities is non-negotiable, and how solving for edge cases makes AI—and products—better for everyone. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Joe Devon about opportunities for using generative AI to improve the accessibility of digital technologiesInterview IntroductionHow did you get involved in AI?Can you starty by giving an overview of what is included in the term "accessibility"?What are some of the major contributors to a lack of accessibility in digital experiences today?Beyond the web, what are some of the other platforms and interfaces that struggle with accessibility?What role does/can generative AI utilities play in improving the accessibility of applications?You recently helped create the AI Model Accessibility Checker (AIMAC) to benchmark which coding agents produce the most accessible code. What are the goals of that project and desired outcomes from its introduction?What were the key findings from AIMAC's initial benchmarking results? Were there any surprises in terms of which models performed better or worse at generating accessible code?The automation offered by using agentic software development toolchains reduces the manual effort involved in building accessible interfaces. What are the opportunities for using generative AI utilities to act as an assistive mechanism for existing sites/technologies?Beyond code generation, what other aspects of the AI development lifecycle need accessibility considerations - training data, model outputs, user interfaces for AI tools themselves?You co-host the Accessibility and Gen AI Podcast. What are some of the common misconceptions you encounter about AI's role in accessibility, either from the AI community or the accessibility community?There's often tension between moving fast with AI adoption and ensuring inclusive design. How do you advise engineering teams to balance innovation speed with accessibility requirements?What specific accessibility issues are most amenable to AI solutions today, and which ones still require human judgment and expertise?As AI models become more capable at generating code and interfaces, what guardrails or validation processes should engineering teams implement to ensure accessibility standards are met?How do you see the role of accessibility specialists evolving as AI tools become more prevalent in the development workflow? Does AI augment their work or change it fundamentally?For engineering leaders building platform and data infrastructure, what accessibility considerations should be baked into foundational systems that AI applications will be built upon?What are the most interesting, unexpected, or challenging lessons that you have learned while working on acessibility awareness?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksAIMAC GitHubGlobal Accessibility Awareness Day (GAAD)GAAD FoundationAltaVistaCursorAccessibilityBraille DisplayBen OgilvieState of Mobile App Accessibility ReportVT-100GhosttyWarp TerminalLLM-as-a-JudgeFFMPEGAria TagsAxe-CoreMiniMax M1Codex MiniQwenKimiGoogle LighthouseGitHub CopilotBe-My-EyesBe-My-AIWebAIMXRAccessXR == Extended RealityDeque UniversityFable accessibility feedback organizationThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 

Jan 5, 2026

56m

73

Beyond the Chatbot: Practical Frameworks for Agentic Capabilities in SaaS

Summary In this episode product and engineering leader Preeti Shukla explores how and when to add agentic capabilities to SaaS platforms. She digs into the operational realities that AI agents must meet inside multi-tenant software: latency, cost control, data privacy, tenant isolation, RBAC, and auditability. Preeti outlines practical frameworks for selecting models and providers, when to self-host, and how to route capabilities across frontier and cheaper models. She discusses graduated autonomy, starting with internal adoption and low-risk use cases before moving to customer-facing features, and why many successful deployments keep a human-in-the-loop. She also covers evaluation and observability as core engineering disciplines - layered evals, golden datasets, LLM-as-a-judge, path/behavior monitoring, and runtime vs. offline checks - to achieve reliability in nondeterministic systems. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Shukla about the process for identifying whether and how to add agentic capabilities to your SaaSInterview IntroductionHow did you get involved in machine learning?Can you start by describing how a SaaS context changes the requirements around the business and technical considerations of an AI agent?Software-as-a-service is a very broad category that includes everything from simple website builders to complex data platforms. How does the scale and complexity of the service change the equation for ROI potential of agentic elements?How does it change the implementation and validation complexity?One of the biggest challenges with introducing generative AI and LLMs in a business use case is the unpredictable cost associated with it. What are some of the strategies that you have found effective in estimating, monitoring, and controlling costs to avoid being upside-down on the ROI equation?Another challenge of operationalizing an agentic workload is the risk of confident mistakes. What are the tactics that you recommend for building confidence in agent capabilities while mitigating potential harms?A corollary to the unpredictability of agent architectures is that they have a large number of variables. What are the evaluation strategies or toolchains that you find most useful to maintain confidence as the system evolves?SaaS platforms benefit from unit economics at scale and often rely on multi-tenant architectures. What are the security controls and identity/attribution mechanisms that are critical for allowing agents to operate across tenant boundaries?What are the most interesting, innovative, or unexpected ways that you have seen SaaS products adopt agentic patterns?What are the most interesting, unexpected, or challenging lessons that you have learned while working on bringing agentic workflows to SaaS products?When is an agent the wrong choice?What are your predictions for the role of agents in the future of SaaS products?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links SaaS == Software as a ServiceMulti-TenancyFew-shot LearningLLM as a JudgeRAG == Retrieval Augmented GenerationMCP == Model Context ProtocolLoveableThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Dec 29, 2025

53m

72

MCP as the API for AI‑Native Systems: Security, Orchestration, and Scale

Summary In this episode Craig McLuckie, co-creator of Kubernetes and founder/CEO of Stacklok, talks about how to improve security and reliability for AI agents using curated, optimized deployments of the Model Context Protocol (MCP). Craig explains why MCP is emerging as the API layer for AI‑native applications, how to balance short‑term productivity with long‑term platform thinking, and why great tools plus frontier models still drive the best outcomes. He digs into common adoption pitfalls (tool pollution, insecure NPX installs, scattered credentials), the necessity of continuous evals for stochastic systems, and the shift from “what the agent can access” to “what the agent knows.” Craig also shares how ToolHive approaches secure runtimes, a virtual MCP gateway with semantic search, orchestration and transactional semantics, a registry for organizational tooling, and a console for self‑service—along with pragmatic patterns for auth, policy, and observability. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Craig McLuckie about improving the security of your AI agents through curated and optimized MCP deploymentInterviewIntroductionHow did you get involved in machine learning?MCP saw huge growth in attention and adoption over the course of this year. What are the stumbling blocks that teams run into when going to production with MCP servers?How do improperly managed MCP servers contribute to security problems in an agent-driven software development workflow?What are some of the problematic practices or shortcuts that you are seeing teams implement when running MCP services for their developers?What are the benefits of a curated and opinionated MCP service as shared infrastructure for an engineering team?You are building ToolHive as a system for managing and securing MCP services as a platform component. What are the strategic benefits of starting with that as the foundation for your company?There are several services for managing MCP server deployment and access control. What are the unique elements of ToolHive that make it worth adopting?For software-focused agentic AI, the approach of Claude Code etc. to be command-line based opens the door for an effectively unbounded set of tools. What are the benefits of MCP over arbitrary CLI execution in that context?What are the most interesting, innovative, or unexpected ways that you have seen ToolHive/MCP used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ToolHive?When is ToolHive the wrong choice?What do you have planned for the future of ToolHive/Stacklok?Contact InfoGitHubLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?LinksStackLokMCP == Model Context ProtocolKubernetesCNCF == Cloud Native Computing FoundationSDLC == Software Development Life CycleThe Bitter LessonTLA+Jepsen TestsToolHiveAPI GatewayGleanThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 

Dec 16, 2025

1h 07m

71

Context as Code, DevX as Leverage: Accelerating Software with Multi‑Agent Workflows

Summary In this episode Max Beauchemin explores how multiplayer, multi‑agent engineering is reshaping individual and team velocity for building data and AI systems. Max shares his journey from Airflow and Superset to going all‑in on AI coding agents, describing a pragmatic “AI‑first reflex” for nearly every task and the emerging role of humans as orchestrators of agents. He digs into shifting bottlenecks — code review, QA, async coordination — and how better DevX/AIX, just‑in‑time context via tools, and structured "context as code" can keep pace with agent‑accelerated execution. He then dives deep into Agor, a new open‑source agent‑orchestration platform: a spatial, multiplayer canvas that manages git worktrees and shared dev environments, enables templated prompts and zone‑based workflows, and exposes an internal MCP so agents can operate the system — and each other. Max discusses session forking, sub‑session trees, scheduling, and safety considerations, and how these capabilities enable parallelization, handoffs across roles, and richer visibility into prompting and cost/usage—pointing to a near future where software engineering centers on orchestrating teams of agents and collaborators. Resources: agor.live (docs, one‑click Codespaces, npm install), Apache Superset, and related MCP/CLI tooling referenced for agent workflows. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Maxime Beauchemin about the impact of multi-player multi-agent engineering on individual and team velocity for building better data systemsInterviewIntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the types of work that you are relying on AI development agents for?As you bring agents into the mix for software engineering, what are the bottlenecks that start to show up?In my own experience there are a finite number of agents that I can manage in parallel. How does Agor help to increase that limit?How does making multi-agent management a multi-player experience change the dynamics of how you apply agentic engineering workflows?Contact InfoLinkedInLinksAgorApache AirflowApache SupersetPresetClaude CodeCodexPlaywright MCPTmuxGit WorktreesOpencode.aiGitHub CodespacesOnaThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 

Nov 24, 2025

59m

70

Inside the Black Box: Neuron-Level Control and Safer LLMs

Summary In this episode of the AI Engineering Podcast Vinay Kumar, founder and CEO of Arya.ai and head of Lexsi Labs, talks about practical strategies for understanding and steering AI systems. He discusses the differences between interpretability and explainability, and why post-hoc methods can be misleading. Vinay shares his approach to tracing relevance through deep networks and LLMs using DL Backtrace, and how interpretability is evolving from an audit tool into a lever for alignment, enabling targeted pruning, fine-tuning, unlearning, and model compression. The conversation covers setting concrete alignment metrics, the gaps in current enterprise practices for complex models, and tailoring explainability artifacts for different stakeholders. Vinay also previews his team's "AlignTune" effort for neuron-level model editing and discusses emerging trends in AI risk, multi-modal complexity, and automated safety agents. He explores when and why teams should invest in interpretability and alignment, how to operationalize findings without overcomplicating evaluation, and the best practices for private, safer LLM endpoints in enterprises, aiming to make advanced AI not just accurate but also acceptable, auditable, and scalable. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Vinay Kumar about strategies and tactics for gaining insights into the decisions of your AI systemsInterview IntroductionHow did you get involved in machine learning?Can you start by giving a quick overview of what explainability means in the context of ML/AI?What are the predominant methods used to gain insight into the internal workings of ML/AI models?How does the size and modality of a model influence the technique and evaluation of methods used?What are the contexts in which a team would incorporate explainability into their workflow?How might explainability be used in a live system to provide guardrails or efficiency/accuracy improvements?What are the aspects of model alignment and explainability that are most challenging to implement?What are the supporting systems that are necessary to be able to effectively operationalize the collection and analysis of model reliability and alignment?"Trust", "Reliability", and "Alignment" are all words that seem obvious until you try to define them concretely. What are the ways that teams work through the creation of metrics and evaluation suites to gauge compliance with those goals?What are the most interesting, innovative, or unexpected ways that you have seen explainability methods used in AI systems?What are the most interesting, unexpected, or challenging lessons that you have learned while working on explainability/reliability at AryaXAI?When is evaluation of explainability overkill?What do you have planned for the future of AryaXAI and explainable AI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links Lexsi LabsAyra.aiDeep LearningAlexNetDL BacktraceGradient BoostSAE == Sparse AutoEncoderShapley ValuesLRP == Layerwise Relevance PropagationIG == Integrated GradientsCircuit DiscoveryF1 ScoreLLM As A JudgeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0 

Nov 16, 2025

1h 00m

69

Building the Internet of Agents: Identity, Observability, and Open Protocols

SummaryIn this episode Guillaume de Saint Marc, VP of Engineering at Cisco Outshift, talks about the complexities and opportunities of scaling multi‑agent systems. Guillaume explains why specialized agents collaborating as a team inspire trust in enterprise settings, and contrasts rigid, “lift-and-shift” agentic workflows with fully self-forming systems. We explore the emerging Internet of Agents, the need for open, interoperable protocols (A2A for peer collaboration and MCP for tool calling), and new layers in the stack for syntactic and semantic communication. Guillaume details foundational needs around discovery, identity, observability, and fine-grained, task/tool/transaction-based access control (TBAC), along with Cisco’s open-source Agency initiative, directory concepts, and OpenTelemetry extensions for agent traces. He shares concrete wins in IT/NetOps—network config validation, root-cause analysis, and the CAPE platform engineer agent—showing dramatic productivity gains. We close with human-in-the-loop UX patterns for multi-agent teams and SLIM, a high-performance group communication layer designed for agent collaboration.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Guillaume de Saint Marc about the complexities and opportunities of scaling multi-agent systemsInterviewIntroductionHow did you get involved in machine learning?Can you start by giving an overview of what constitutes a "multi-agent" system?Many of the multi-agent services that I have read or spoken about are designed and operated by a single department or organization. What are some of the new challenges that arise when allowing agents to communicate and co-ordinate outside of organizational boundaries?The web is the most famous example of a successful decentralized system, with HTTP being the most ubiquitous protocol powering it. What does the internet of agents look like?What is the role of humans in that equation?The web has evolved in a combination of organic and planned growth and is vastly more complex and complicated than when it was first introduced. What are some of the most important lessons that we should carry forward into the connectivity of AI agents?Security is a critical aspect of the modern web. What are the controls, assertions, and constraints that we need to implement to enable agents to operate with a degree of trust while also being appropriately constrained?The AGNTCY project is a substantial investment in an open architecture for the internet of agents. What does it provide in terms of building blocks for teams and businesses who are investing in agentic services?What are the most interesting, innovative, or unexpected ways that you have seen AGNTCY/multi-agent systems used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on multi-agent systems?When is a multi-agent system the wrong choice?What do you have planned for the future of AGNTCY/multi-agent systems?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksOutshift by CiscoMulti-Agent SystemsDeep LearningMerakiSymbolic ReasoningTransformer ArchitectureDeepSeekLLM ReasoningRené DescartesKanbanA2A (Agent-to-Agent) ProtocolMCP == Model Context ProtocolAGNTCYICANN == Internet Corporation for Assigned Names and NumbersOSI LayersOCI == Open Container InitiativeOASF == Open Agentic Schema FrameworkOracle AgentSpecSplunkOpenTelemetryCAIPE == Community AI Platform EngineerAGNTCY Coffee ShopThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 10, 2025

1h 07m

68

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

SummaryIn this episode of the AI Engineering Podcast Will Vincent, Python developer advocate at JetBrains (PyCharm), talks about how AI utilities are revolutionizing software engineering beyond basic code completion. He discusses the shift from "vibe coding" to "vibe engineering," where engineers collaborate with AI agents through clear guidelines, iterative specs, and tight guardrails. Will shares practical techniques for getting real value from these tools, including loading the whole codebase for context, creating agent specifications, constraining blast radius, and favoring step-by-step plans over one-shot generations. The conversation covers code review gaps, deployment context, and why continuity across tools matters, as well as JetBrains' evolving approach to integrated AI, including support for external and local models. Will emphasizes the importance of human oversight, particularly for architectural choices and production changes, and encourages experimentation and playfulness while acknowledging the ethics, security, and reliability tradeoffs that come with modern LLMs.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Will Vincent about selecting and using AI software engineering utilities and making them work for your teamInterviewIntroductionHow did you get involved in machine learning?Software engineering is a discipline that is relatively young in relative terms, but does have several decades of history. As someone working for a developer tools company, what is your broad opinion on the impact of AI on software engineering as an occupation?There are many permutations of AI development tools. What are the broad categories that you see?What are the major areas of overlap?What are the styles of coding agents that you are seeing the broadest adoption for?What are your thoughts on the role of editors/IDEs in an AI-driven development workflow?Many of the code generation utilities are executed on a developer's computer in a single-player mode. What are some strategies that you have seen or experimented with to extract and share techniques/best practices/prompt templates at the team level?While there are many AI-powered services that hook into various stages of the software development and delivery lifecycle, what are the areas where you are seeing gaps in the user experience?What are the most interesting, innovative, or unexpected ways that you have seen AI used in the context of software engineering workflows?What are the most interesting, unexpected, or challenging lessons that you have learned while working on developer tooling in the age of AI?When is AI-powered the wrong choice?What do you have planned for the future of AI in the context of Jetbrains?What are your predictions/hopes for the future of AI for software engineering?Contact InfoWill VincentParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksJetBrainsSimon WillisonVibe Engineering PostGitHub CopilotAGENTS.mdCopilot AGENTS.md instructionsKiro IDEClaude CodeJetbrains QuickEditClaude Agent in JetBrains IDEsRuff linteruv package managerty type checkerpyreflyIDE == Integrated Development EnvironmentOllamaLM StudioGoogle GemmaDeepseekgpt-ossOllama CloudGemini DiffusionDjango Annual SurveyCo-Intelligence by Ethan Mollick (affiliate link)The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 2, 2025

59m

67

From MRI to World Models: How AI Is Changing What We See

SummaryIn this episode of the AI Engineering Podcast Daniel Sodickson, Chief of Innovation in Radiology at NYU Grossman School of Medicine, talks about harnessing AI systems to truly understand images and revolutionize science and healthcare. Dan shares his journey from linear reconstruction to early deep learning for accelerated MRI, highlighting the importance of domain expertise when adapting models to specialized modalities. He explores "upstream" AI that changes what and how we measure, using physics-guided networks, prior knowledge, and personal baselines to enable faster, cheaper, and more accessible imaging. The conversation covers multimodal world models, cross-disciplinary translation, explainability, and a future where agents flag abnormalities while humans apply judgment, as well as provocative frontiers like "imaging without images," continuous health monitoring, and decoding brain activity. Dan stresses the need to preserve truth, context, and human oversight in AI-driven imaging, and calls for tools that distill core methodologies across disciplines to accelerate understanding and progress.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Daniel Sodickson about the impact and applications of AI that is capable of image understandingInterviewIntroductionHow did you get involved in machine learning?Images and vision are concepts that we understand intuitively, but which have a large potential semantic range. How would you characterize the scope and application of imagery in the context of AI and other autonomous technologies?Can you give an overview of the current state of image/vision capabilities in AI systems?A predominant application of machine vision has been for object recognition/tracking. How are advances in AI changing the range of problems that can be solved with computer vision systems?A substantial amount of work has been done on processing of images such as the digital pictures taken by smartphones. As you move to other types of image data, particularly in non-visible light ranges, what are the areas of similarity and in what ways do we need to develop new processing/analysis techniques?What are some of the ways that AI systems will change the ways that we conceive of What are the most interesting, innovative, or unexpected ways that you have seen AI vision used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on imaging technologies and techniques?When is AI the wrong choice for vision/imaging applications?What are your predictions for the future of AI image understanding?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksMRI == Magnetic Resonance ImagingLinear AlgorithmNon-Linear AlgorithmCompressed SensingDictionary Learning AlgorithmDeep LearningCT ScanCambrian ExplosionLIDAR Point CloudSynthetic Aperture RadarGeoffrey HintonCo-Intelligence by Ethan Mollick (affiliate link)TomographyX-Ray CrystallographyCERNCLIP ModelPhysics-Guided Neural NetworkFunctional MRIA Path Toward Autonomous Machine Intelligence by Yann LeCunThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 27, 2025

48m

66

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

SummaryIn this episode Andrew Filev, CEO and founder of ZenCoder, takes a deep dive into the system design, workflows, and organizational changes behind building agentic coding systems. He traces the evolution from autocomplete to truly agentic models, discusses why context engineering and verification are the real unlocks for reliability, and outlines a pragmatic path from “vibe coding” to AI‑first engineering. Andrew shares ZenCoder’s internal playbook: PRD and tech spec co‑creation with AI, human‑in‑the‑loop gates, test‑driven development, and emerging BDD-style acceptance testing. He explores multi-repo context, cross-service reasoning, and how AI reshapes team communication, ownership, and architecture decisions. He also covers cost strategies, when to choose agents vs. manual edits, and why self‑verification and collaborative agent UX will define the next wave. Andrew offers candid lessons from building ZenCoder—why speed of iteration beats optimizing for weak models, how ignoring the emotional impact of vibe coding slowed brand momentum, and where agentic tools fit across greenfield and legacy systems. He closes with predictions for the next year: self‑verification, parallelized agent workflows, background execution in CI, and collaborative spec‑driven development moving code review upstream.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Andrew Filev about the system design and integration strategies behind building coding agents at ZencoderInterviewIntroductionHow did you get involved in ML/AI?There have been several iterations of applications for generative AI models in the context of software engineering. How would you characterize the different approaches or categories?Over the course of this summer (2025) the term "vibe coding" gained prominence with the idea that the human just needs to be worried about whether the software does what you ask, not how it is written. How does that sentiment compare to your philosophies on the role of agentic AI in the lifecycle of software?This points at a broader challenge for software engineers in the AI era; how much control can and should we cede to the LLMs, and over what elements of the software process?This also brings up useful questions around the experience of the engineer collaborating with the agent. What are the different interaction patterns that individuals and teams should be thinking of in their use of AI engineering tools?Should the agent be proactive? reactive? what are the triggers for an action to be taken and to what extent?What differentiates a coding agent from an agentic editor?The key challenge in any agent system is context engineering. Software is inherently structured and provides strong feedback loops. But it can also be very messy or difficult to encapsulate in a single context window. What are some of the data structures/indexing strategies/retrieval methods that are most useful when providing guidance to an agent?Software projects are rarely fully self-contained, and often need to cross repository boundaries, as well as manage dependencies. What are some of the more challenging aspects of identifying and accounting for those sometimes implicit relationships?What are some of the strategies that are most effective for yielding productive results from an agent in terms of prompting and scoping of the problem?What are some of the heuristics that you use to determine whether and how to employ an agent for a given task vs. doing it manually?How can the agents assist in the decomposition and planning of complex projects?What are some of the ways that single-player interaction strategies can be turned into team/multi-player strategies?What are some of the ways that teams can create and curate productive patterns to accelerate everyone equally?What are the most interesting, innovative, or unexpected ways that you have seen coding agents used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on coding agents at Zencoder?When is/are Zencoder/coding agents the wrong choice?What do you have planned for the future of Zencoder/agentic software engineering?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksZencoderWrikeDARPA Robotics ChallengeCognitive ComputingAndrew NgSebastian ThrunGithub CopilotRAG == Retrieval Augmented GenerationRe-rankingClaude Sonnet 3.5SWE-BenchVibe CodingAI First EngineeringWaterfall Software EngineeringAgile Software EngineeringPRD == Project Requirements DocumentBDD == Behavior-Driven DevelopmentVSCodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 19, 2025

1h 06m

65

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

SummaryIn this episode of the AI Engineering Podcast Lucas Thelosen and Drew Gillson talk about Orion, their agentic analytics platform that delivers proactive, push-based insights to business users through asynchronous thinking with rich organizational context. Lucas and Drew share their approach to building trustworthy analysis by grounding in semantic layers, fact tables, and quality-assurance loops, as well as their focus on accuracy through parallel test-time compute and evolving from probabilistic steps to deterministic tools. They discuss the importance of context engineering, multi-agent orchestration, and security boundaries for enterprise deployments, and share lessons learned on consistency, tool design, user change management, and the emerging role of "AI manager" as a career path. The conversation highlights the future of AI knowledge workers collaborating across organizations and tools while simplifying UIs and raising the bar on actionable, trustworthy analytics.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gillson about their experiences building an agentic analytics platform and the challenges of ensuring accuracy to build trustInterviewIntroductionHow did you get involved in machine learning?Can you describe what Orion is and the story behind it?Business analytics is a field that requires a high degree of accuracy and detail because of the potential for substantial impact on the business (positive and negative). These are areas that generative AI has struggled with achieving consistently. What was your process for building confidence in your ability to achieve that threshold before committing to the path you are on now?There are numerous ways that generative AI can be incorporated into the process of designing, building, and delivering analytical insights. How would you characterize the different strategies that data teams and vendors have approached that problem?What do you see as the organizational benefits of moving to a push-based model for analytics?Can you describe the system architecture of Orion?Agentic design patterns are still in the early days of being developed and proven out. Can you give a breakdown of the approach that you are using?How do you think about the responsibility boundaries, communication paths, temporal patterns, etc. across the different agents?Tool use is a key component of agentic architectures. What is your process for identifying, developing, validating, and securing the tools that you provide to your agents?What are the boundaries and extension points that you see when building agentic systems? What are the opportunities for using e.g. A2A for protocol for managing agentic hand-offs?What is your process for managing the experimentation loop for changes to your models, data, prompts, etc. as you iterate on your product?What are some of the ways that you are using the agents that power your system to identify and act on opportunities for self-improvement?What are the most interesting, innovative, or unexpected ways that you have seen Orion used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?When is an agentic approach the wrong choice?What do you have planned for the future of Orion?Contact InfoLucasLinkedInDrewLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGravityOrion Data Engineering Podcast EpisodeSite Reliability EngineeringAnthropic Claude Sonnet 4.5A2A (Agent2Agent) ProtocolSimon WillisonAI Lethal TrifectaBehavioral ScienceGrounded TheoryLLM as a JudgeRLHF == Reinforcement Learning from Human FeedbackThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 11, 2025

1h 12m

64

Building Production-Ready AI Agents with Pydantic AI

SummaryIn this episode of the AI Engineering Podcast Samuel Colvin, creator of Pydantic and founder of Pydantic Inc, talks about Pydantic AI - a type-safe framework for building structured AI agents in Python. Samuel explains why he built Pydantic AI to bring FastAPI-like ergonomics and production-grade engineering to agents, focusing on strong typing, minimal abstractions, and reliability, observability, and stability. He explores the evolving agent ecosystem, patterns for single vs. many agents, graphs vs. durable execution, and how Pydantic AI approaches structured I/O, tool calling, and MCP with type safety in mind. Samuel also shares insights on design trade-offs, model-provider churn, schema unification, safe code execution, security gaps, and the importance of open standards and OpenTelemetry for observability.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Samuel Colvin about the Pydantic AI framework for building structured AI agentsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Pydantic AI is and the story behind it?What are the core use cases and capabilities that you are focusing on with PydanticAI?The agent SDK landscape has been incredibly crowded and volatile since the introduction of LangChain and LlamaIndex. Can you give your summary of the current state of the ecosystem?What are the broad categories that you use when evaluating the various frameworks?Beyond the volatility of the frameworks, there is also a rapid pace of evolution in the different styles/patterns of agents. What are the patterns and integrations that Pydantic AI is best suited for?Can you describe the overall design/architecture of the Pydantic AI framework?How have the design and scope evolved since you first started working on it?For someone who wants to build a sophisticated, production-ready AI agent with Pydantic AI, what is your recommended path from idea to deployment?What are the elements of the framework that help engineers across those different stages of the lifecycle?What are some of the key learnings that you gained from all of your efforts on Pydantic that have been most helpful in developing and promoting Pydantic AI?What are some of the new and exciting failure modes that agentic applications introduce as compared to web/mobile/scientific/etc. applications?What are the most interesting, innovative, or unexpected ways that you have seen Pydantic AI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pydantic AI?When is Pydantic AI the wrong choice?What do you have planned for the future of Pydantic AI?Contact InfoGitHubLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksPydanticPydantic AIPydantic IncPydantic LogfireOpenAI AgentsGoogle ADKLangChainLlamaIndexCrewAIDurable ExecutionTemporalMCP == Model Context ProtocolClaude CodeTypescriptGemini Structured OutputOpenAI Structured OutputDottxt Outlines SDKsmolagentsLiteLLMOpenRouterOpenAI Responses APIFastAPISQLModelAI SDK JavaScriptLangGraphNextJSPyodideAI Elements frontend component libraryThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 7, 2025

50m

63

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

SummaryIn this episode of the AI Engineering Podcast Brijesh Tripathi, CEO of Flex AI, talks about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. He discusses Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterviewIntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?LinksFlex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 28, 2025

55m

62

Right-Sizing AI: Small Language Models for Real-World Production

SummaryIn this episode of the AI Engineering Podcast Steven Huels,  Vice President of AI Engineering & Product Strategy at Red Hat, talks about the practical applications of small language models (SLMs) for production workloads. He discusses how SLMs offer a pragmatic choice due to their ability to fit on single enterprise GPUs and provide model selection trade-offs. The conversation covers self-hosting vs using API providers, organizational capabilities needed for running production-grade LLMs, and the importance of guardrails and automated evaluation at scale. They also explore the rise of agentic systems and service-oriented approaches powered by smaller models, highlighting advances in customization and deployment strategies. Steven shares real-world examples and looks to the future of agent cataloging, continuous retraining, and resource efficiency in AI engineering.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Steven Huels about the benefits of small language models for production workloadsInterviewIntroductionHow did you get involved in machine learning?Language models are available in a wide range of sizes, measured both in terms of parameters and disk space. What are your heuristics for deciding what qualifies as a "small" vs. "large" language model?What are the corresponding heuristics for when to use a small vs. large model?The predominant use case for small models is in self-hosted contexts, which requires a certain amount of organizational sophistication. What are some helpful questions to ask yourself when determining whether to implement a model-serving stack vs. relying on hosted options?What are some examples of "small" models that you have seen used effectively?The buzzword right now is "agentic" for AI driven workloads. How do small models fit in the context of agent-based workloads?When and where should you rely on larger models?When speaking of small models, one of the common requirements for making them truly useful is to fine-tune them for your problem domain and organizational data. How has the complexity and difficulty of that operation changed over the past ~2 years?Serving models requires several operational capabilities beyond the raw inference serving. What are the other infrastructure and organizational investments that teams should be aware of as they embark on that path?What are the most interesting, innovative, or unexpected ways that you have seen small language models used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing inference and model customization?When is a small or self-hosted language model the wrong choice?What are your predictions for the near future of small language model capabilities/availability?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksRedHat AI EngineeringGenerative AIPredictive AIChatGPTQLORAHuggingFacevLLMOpenShift AILlama ModelsDeepSeekGPT-OSSMistralMixture of Experts (MoE)QwenInstructLabSFT == Supervised Fine TuningLORAThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 20, 2025

50m

61

AI Agents and Identity Management

SummaryIn this episode of the AI Engineering Podcast Julianna Lamb, co-founder and CTO of Stytch, talks about the complexities of managing identity and authentication in agentic workflows. She explores the evolving landscape of identity management in the context of machine learning and AI, highlighting the importance of flexible compute environments and seamless data exchange. The conversation covers implications of AI agents on identity management, including granular permissions, OAuth protocol, and adapting systems for agentic interactions. Julianna also discusses rate limiting, persistent identity, and evolving standards for managing identity in AI systems. She emphasizes the need to experiment with AI agents and prepare systems for integration to stay ahead in the rapidly advancing AI landscape.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Julianna Lamb about the complexities of managing identity and auth in agentic workflowsInterviewIntroductionHow did you get involved in machine learning?The term "identity" is very overloaded. Can you start by giving your definition in the context of technical systems?What are some of the different ways that AI agents intersect with identity?We have decades of experience and effort in building identity infrastructure for the internet, what are the most significant ways in which that is insufficient for agent-based use cases?I have heard anecdotal references to the ways in which AI agents lead to a proliferation of "identities". How would you characterize the magnitude of the difference in scale between human-powered identity, deterministic automation (e.g. bots or bot-nets), and AI agents?The other major element of establishing and verifying "identity" is how that intersects with permissions or authorization. What are the major shortcomings of our existing investment in managing and auditing access and control once you are within a system?How does that get amplified with AI agents?Typically authentication has been done at the perimeter of a system. How does that architecture change when accounting for AI agents?How does that get complicated by where the agent originates? (e.g external agents interacting with a third-party system vs. internal agents operated by the service provider)What are the concrete steps that engineering teams should be taking today to start preparing their systems for agentic use-cases (internal or external)?How do agentic capabilities change the means of protecting against malicious bots? (e.g. bot detection, defensive agents, etc.)What are the most interesting, innovative, or unexpected ways that you have seen authn/authz/identity addressed for AI use cases?What are the most interesting, unexpected, or challenging lessons that you have learned while working on identity/auth(n|z) systems?What are your predictions for the future of identity as adoption and sophistication of AI systems progresses?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksStytchAI AgentMachine To Machine AuthenticationAPI AuthenticationMCP == Model Context ProtocolOAuthIdentity ProviderOAuth ScopesOAuth 2.1CaptchaRBAC == Role-Based Access ControlABAC == Attribute-Based Access ControlReBAC == Relationship-Based Access ControlGoogle ZanzibarIdempotenceDynamic Client RegistrationLarge Action ModelsClaude CodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 13, 2025

53m

60

Revolutionizing Production Systems: The Resolve AI Approach

SummaryIn this episode of the AI Engineering Podcast, CEO of Resolve AI Spiros Xanthos shares his insights on building agentic capabilities for operational systems. He discusses the limitations of traditional observability tools and the need for AI agents that can reason through complex systems to provide actionable insights and solutions. The conversation highlights the architecture of Resolve AI, which integrates with existing tools to build a comprehensive understanding of production environments, and emphasizes the importance of context and memory in AI systems. Spiros also touches on the evolving role of AI in production systems, the potential for AI to augment human operators, and the need for continuous learning and adaptation to fully leverage these advancements.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Spiros Xanthos about architecting agentic capabilities for operational challenges with managing production systems.InterviewIntroductionHow did you get involved in machine learning?Can you describe what Resolve AI is and the story behind it?We have decades of experience as an industry in managing operational complexity. What are the critical failures in capabilities that you are addressing with the application of AI?Given the existing capabilities of dedicated platforms (e.g. Grafana, PagerDuty, Splunk, etc), what is your reasoning for building a new system vs. a new feature of existing operational product?Over the past couple of years the industry has developed a growing number of agent patterns. What was your approach in evaluating and selecting a particular approach for your product?One of the complications of building any platform that supports operational needs of engineering teams is the complexity of integrating with their technology stack. This is doubly true when building an AI system that needs rich context. What are the core primitives that you are relying on to build a robust offering?How are you managing the learning process for your systems to allow for iterative discovery and improvement?What are your strategies for personalizing those discoveries to a given customer and operating environment?One of the interesting challenges in agentic systems is managing the user experience for human-in-the-loop and machine to human handoffs in each direction. How are you thinking about that, especially given the criticality of the systems that you are interacting with?As more of the code that is running in production environments is co-developed with AI, what impact do you anticipate on the overall operational resilience of the systems being monitored?One of the challenges of working with LLMs is the cold start problem where every conversation starts from scratch. How are you approaching the overall problem of context engineering and ensuring that you are consistently providing the necessary information for the model to be effective in its role?What are the most interesting, innovative, or unexpected ways that you have seen Resolve AI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Resolve AI?When is Resolve AI the wrong choice?What do you have planned for the future of Resolve AI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksResolve AISplunkOpenTelemetrySplunk ObservabilityContext EngineeringGrafanaKubernetesPagerDutyThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 4, 2025

51m

59

Designing Scalable AI Systems with FastMCP: Challenges and Innovations

SummaryIn this episode of the AI Engineering Podcast Jeremiah Lowin, founder and CEO of Prefect Technologies, talks about the FastMCP framework and the design of MCP servers. Jeremiah explains the evolution of FastMCP, from its initial creation as a simpler alternative to the MCP SDK to its current role in facilitating the deployment of AI tools. The discussion covers the complexities of designing MCP servers, the importance of context engineering, and the potential pitfalls of overwhelming AI agents with too many tools. Jeremiah also highlights the importance of simplicity and incremental adoption in software design, and shares insights into the future of MCP and the broader AI ecosystem. The episode concludes with a look at the challenges of authentication and authorization in AI applications and the exciting potential of MCP as a protocol for the future of AI-driven business logic.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Jeremiah Lowin about the FastMCP framework and how to design and build your own MCP serversInterviewIntroductionHow did you get involved in machine learning?Can you start by describing what MCP is and its purpose in the ecosystem of AI applications?What is FastMCP and what motivated you to create it?Recognizing that MCP is relatively young, how would you characterize the landscape of MCP frameworks?What are some of the stumbling blocks on the path to building a well engineered MCP server?What are the potential ramifications of poorly designed and implemented MCP implementations?In the overall context of an AI-powered/agentic application, what are the tradeoffs of investing in the MCP protocol? (e.g. engineering effort, process isolation, tool creation, auth(n|z), etc.)In your experience, what are the architectural patterns that you see of MCP implementation and usage?There are a multitude of MCP servers available for a variety of use cases. What are the key factors that someone should be using to evaluate their viability for a production use case?Can you give an overview of the key characteristics of FastMCP and why someone might select it as their implementation target for a custom MCP server?How have the design, scope, and goals of the project evolved since you first started working on it?For someone who is using FastMCP as the framework for creating their own AI tools, what are some of the design considerations or best practices that they should be aware of?What are some of the ways that someone might consider integrating FastMCP into their existing Python-powered web applications (e.g. FastAPI, Django, Flask, etc.)As you continue to invest your time and energy into FastMCP, what is your overall goal for the project?What are the most interesting, innovative, or unexpected ways that you have seen FastMCP used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FastMCP?When is FastMCP the wrong choice?What do you have planned for the future of FastMCP?Contact InfoLinkedInGitHubParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksFastMCPFastMCP CloudPrefectModel Context Protocol (MCP)AI ToolsFastAPIPython DecoratorWebsocketsSSE == Server-Sent EventsStreamable HTTPOAuthMCP GatewayMCP SamplingFlaskDjangoASGIMCP ElicitationAuthKitDynamic Client RegistrationsmolagentsLarge Active ModelsA2AThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Aug 26, 2025

1h 13m

58

Proactive Monitoring in Heavy Industry: The Role of AI and Human Curiosity

SummaryIn this episode of the AI Engineering Podcast Dr. Tara Javidi, CTO of KavAI, talks about developing AI systems for proactive monitoring in heavy industry. Dr. Javidi shares her background in mathematics and information theory, influenced by Claude Shannon's work, and discusses her approach to curiosity-driven AI that mimics human curiosity to improve data collection and predictive analytics. She explains how KavAI's platform uses generative AI models to enhance industrial monitoring by addressing informational blind spots and reducing reliance on human oversight. The conversation covers the architecture of KavAI's systems, integrating AI with existing workflows, building trust with operators, and the societal impact of AI in preventing environmental catastrophes, ultimately highlighting the future potential of information-centric AI models.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems.Your host is Tobias Macey and today I'm interviewing Dr. Tara Javidi about building AI systems for proactive monitoring of physical environments for heavy industryInterviewIntroductionHow did you get involved in machine learning?Can you describe what KavAI is and the story behind it?What are some of the current state-of-the-art applications of AI/ML for monitoring and accident prevention in industrial environments?What are the shortcomings of those approaches?What are some examples of the types of harm that you are focused on preventing or mitigating with your platform?On your site it mentions that you have created a foundation model for physical awareness. What are some examples of the types of predictive/generative capabilities that your model provides?A perennial challenge when building any digital model of a physical system is the lack of absolute fidelity. What are the key sources of information acquisition that you rely on for your platform?In addition to your foundation model, what are the other systems that you incorporate to perform analysis and catalyze action?Can you describe the overall system architecture of your platform?What are some of the ways that you are able to integrate learnings across industries and environments to improve the overall capacity of your models?What are the most interesting, innovative, or unexpected ways that you have seen KavAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on KavAI?When is KavAI/Physical AI the wrong choice?What do you have planned for the future of KavAI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?LinksKavAIInformation TheoryClaude ShannonThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Aug 23, 2025

40m

57

Navigating the AI Landscape: Challenges and Innovations in Retail

SummaryIn this episode of the AI Engineering Podcast machine learning engineer Shashank Kapadia explores the transformative role of generative AI in retail. Shashank shares his journey from an engineering background to becoming a key player in ML, highlighting the excitement of understanding human behavior at scale through AI. He discusses the challenges and opportunities presented by generative AI in retail, where it complements traditional ML by enhancing explainability and personalization, predicting consumer needs, and driving autonomous shopping agents and emotional commerce. Shashank elaborates on the architectural and operational shifts required to integrate generative AI into existing systems, emphasizing orchestration, safety nets, and continuous learning loops, while also addressing the balance between building and buying AI solutions, considering factors like data privacy and customization.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Shashank Kapadia about applications of generative AI in retailInterviewIntroductionHow did you get involved in machine learning?Can you summarize the main applications of generative AI that you are seeing the most benefit from in retail/ecommerce?What are the major architectural patterns that you are deploying for generative AI workloads?Working at an organization like WalMart, you already had a substantial investment in ML/MLOps. What are the elements of that organizational capability that remain the same, and what are the catalyzed changes as a result of generative models?When working at the scale of Walmart, what are the different types of bottlenecks that you encounter which can be ignored at smaller orders of magnitude?Generative AI introduces new risks around brand reputation, accuracy, trustworthiness, etc. What are the architectural components that you find most effective in managing and monitoring the interactions that you provide to your customers?Can you describe the architecture of the technical systems that you have built to enable the organization to take advantage of generative models?What are the human elements that you rely on to ensure the safety of your AI products?What are the most interesting, innovative, or unexpected ways that you have seen generative AI break at scale?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI?When is generative AI the wrong choice?What are your paying special attention to over the next 6 - 36 months in AI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksWalmart LabsThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Aug 7, 2025

52m

56

The Anti-CRM CRM: How Spiro Uses AI to Transform Sales

SummaryIn this episode of the AI Engineering podcast Adam Honig, founder of Spiro AI, about using AI to automate CRM systems, particularly in the manufacturing sector. Adam shares his journey from running a consulting company focused on Salesforce to founding Spiro, and discusses the challenges of traditional CRM systems where data entry is often neglected. He explains how Spiro addresses this issue by automating data collection from emails, phone calls, and other communications, providing a rich dataset for machine learning models to generate valuable insights. Adam highlights how Spiro's AI-driven CRM system is tailored to the manufacturing industry's unique needs, where sales are relationship-driven rather than funnel-based, and emphasizes the importance of understanding customer interactions and order histories to predict future business opportunities. The conversation also touches on the evolution of AI models, leveraging powerful third-party APIs, managing context windows, and platform dependencies, with Adam sharing insights into Spiro's future plans, including product recommendations and dynamic data modeling approaches.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Adam Honig about using AI to automate CRM maintenanceInterviewIntroductionHow did you get involved in machine learning?Can you describe what Spiro is and the story behind it?What are the specific challenges posed by the manufacturing industry with regards to sales and customer interactions?How does the type of manufacturing and target customer influence the level of effort and communication involved in the sales and customer service cycles?Before we discuss the opportunities for automation, can you describe the typical interaction patterns and workflows involved in the care and feeding of CRM systems?Spiro has been around since 2014, long pre-dating the current era of generative models. What were your initial targets for improving efficiency and reducing toil for your customers with the aid of AI/ML?How have the generational changes of deep learning and now generative AI changed the ways that you think about what is possible in your product?Generative models reduce the level of effort to get a proof of concept for language-oriented workflows. How are you pairing them with more narrow AI that you have built?Can you describe the overall architecture of your platform and how it has evolved in recent years?While generative models are powerful, they can also become expensive, and the costs are hard to predict. How are you thinking about vendor selection and platform risk in the application of those models?What are the opportunities that you see for the adoption of more autonomous applications of language models in your product? (e.g. agents)What are the confidence building steps that you are focusing on as you investigate those opportunities?What are the most interesting, innovative, or unexpected ways that you have seen Spiro used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI in the CRM space?When is AI the wrong choice for CRM workflows?What do you have planned for the future of Spiro?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksSpiroDeepgramCognee EpisodeAgentic MemoryGraphRAGPodcast EpisodeOpenAI Assistant APIThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jul 21, 2025

46m

55

Unlocking AI Potential with AMD's ROCm Stack

SummaryIn this episode of the AI Engineering podcast Anush Elangovan, VP of AI software at AMD, discusses the strategic integration of software and hardware at AMD. He emphasizes the open-source nature of their software, fostering innovation and collaboration in the AI ecosystem, and highlights AMD's performance and capability advantages over competitors like NVIDIA. Anush addresses challenges and opportunities in AI development, including quantization, model efficiency, and future deployment across various platforms, while also stressing the importance of open standards and flexible solutions that support efficient CPU-GPU communication and diverse AI workloads.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Anush Elangovan about AMD's work to expand the playing field for AI training and inferenceInterviewIntroductionHow did you get involved in machine learning?Can you describe what your work at AMD is focused on?A lot of the current attention on hardware for AI training and inference is focused on the raw GPU hardware. What is the role of the software stack in enabling and differentiating that underlying compute?CUDA has gained a significant amount of attention and adoption in the numeric computation space (AI, ML, scientific computing, etc.). What are the elements of platform risk associated with relying on CUDA as a developer or organization?The ROCm stack is the key element in AMD's AI and HPC strategy. What are the elements that comprise that ecosystem?What are the incentives for anyone outside of AMD to contribute to the ROCm project?How would you characterize the current competitive landscape for AMD across the AI/ML lifecycle stages? (pre-training, post-training, inference, fine-tuning)For teams who are focused on inference compute for model serving, what do they need to know/care about in regards to AMD hardware and the ROCm stack?What are the most interesting, innovative, or unexpected ways that you have seen AMD/ROCm used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AMD's AI software ecosystem?When is AMD/ROCm the wrong choice?What do you have planned for the future of ROCm?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksImageNetAMDROCmCUDAHuggingFaceLlama 3Llama 4QwenDeepSeek R1MI300XNokia SymbianUALink StandardQuantizationHIPIFYROCm TritonAMD Strix HaloAMD EpycLiquid NetworksMAMBA ArchitectureTransformer ArchitectureNPU == Neural Processing Unitllama.cppOllamaPerplexity ScoreNUMA == Non-Uniform Memory AccessvLLMSGLangThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jun 23, 2025

42m

54

Applying AI To The Construction Industry At Buildots

SummaryIn this episode of the Machine Learning Podcast Ori Silberberg, VP of Engineering at Buildots, talks about transforming the construction industry with AI. Ori shares how Buildots uses computer vision and AI to optimize construction projects by providing real-time feedback, reducing delays, and improving efficiency. Learn about the complexities of digitizing the construction industry, the technical architecture of Buildoz, and how its AI-driven solutions create a digital twin of construction sites. Ori emphasizes the importance of explainability and actionable insights in AI decision-making, highlighting the potential of generative AI to further enhance the construction process from planning to execution.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Ori Silberberg about applications of AI for optimizing building constructionInterviewIntroductionHow did you get involved in machine learning?Can you describe what Buildotds is and the story behind it?What types of construction projects are you focused on? (e.g. residential, commercial, industrial, etc.)What are the main types of inefficiencies that typically occur on those types of job sites?What are the manual and technical processes that the industry has typically relied on to address those sources of waste and delay?In many ways the construction industry is as old as civilization. What are the main ways that the information age has transformed construction?What are the elements of the construction industry that make it resistant to digital transformation?Can you describe how you are applying AI to this complex and messy problem?What are the types of data that you are able to collect?How are you automating that data collection so that construction crews don't have to add extra work or distractions to their day?For construction crews that are using Buildots, can you talk through how it integrates into the overall process from site planning to project completion?Can you describe the technical architecture of the Buildots platform?Given the safety critical nature of construction, how does that influence the way that you think about the types of AI models that you use and where to apply them?What are the most interesting, innovative, or unexpected ways that you have seen Buildots used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Buildots?What do you have planned for the future of AI usage at Buildots?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksBuildotsCAD == Computer Aided DesignComputer VisionLIDARGC == General ContractorKubernetesThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jun 14, 2025

49m

53

The Future of AI Systems: Open Models and Infrastructure Challenges

SummaryIn this episode of the AI Engineering Podcast Jamie De Guerre, founding SVP of product at Together.ai, explores the role of open models in the AI economy. As a veteran of the AI industry, including his time leading product marketing for AI and machine learning at Apple, Jamie shares insights on the challenges and opportunities of operating open models at speed and scale. He delves into the importance of open source in AI, the evolution of the open model ecosystem, and how Together.ai's AI acceleration cloud is contributing to this movement with a focus on performance and efficiency.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Jamie de Guerre about the role of open models in the AI economy and how to operate them at speed and at scaleInterviewIntroductionHow did you get involved in machine learning?Can you describe what Together AI is and the story behind it?What are the key goals of the company?The initial rounds of open models were largely driven by massive tech companies. How would you characterize the current state of the ecosystem that is driving the creation and evolution of open models?There was also a lot of argument about what "open source" and "open" means in the context of ML/AI models, and the different variations of licenses being attached to them (e.g. the Meta license for Llama models). What is the current state of the language used and understanding of the restrictions/freedoms afforded?What are the phases of organizational/technical evolution from initial use of open models through fine-tuning, to custom model development?Can you outline the technical challenges companies face when trying to train or run inference on large open models themselves?What factors should a company consider when deciding whether to fine-tune an existing open model versus attempting to train a specialized one from scratch?While Transformers dominate the LLM landscape, there's ongoing research into alternative architectures. Are you seeing significant interest or adoption of non-Transformer architectures for specific use cases? When might those other architectures be a better choice?While open models offer tremendous advantages like transparency, control, and cost-effectiveness, are there scenarios where relying solely on them might be disadvantageous?When might proprietary models or a hybrid approach still be the better choice for a specific problem?Building and scaling AI infrastructure is notoriously complex. What are the most significant technical or strategic challenges you've encountered at Together AI while enabling scalable access to open models for your users?What are the most interesting, innovative, or unexpected ways that you have seen open models/the TogetherAI platform used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on powering AI model training and inference?Where do you see the open model space heading in the next 1-2 years? Any specific trends or breakthroughs you anticipate?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksTogether AIFine TuningPost-TrainingSalesforce ResearchMistralAgentforceLlama ModelsRLHF == Reinforcement Learning from Human FeedbackRLVR == Reinforcement Learning from Verifiable RewardsTest Time ComputeHuggingFaceRAG == Retrieval Augmented GenerationPodcast EpisodeGoogle GemmaLlama 4 MaverickPrompt EngineeringvLLMSGLangHazy Research labState Space ModelsHyena ModelMamba ArchitectureDiffusion Model ArchitectureStable DiffusionBlack Forest Labs Flux ModelNvidia BlackwellPyTorchRustDeepseek R1GGUFPika Text To VideoThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jun 1, 2025

51m

52

The Rise of Agentic AI: Transforming Business Operations

SummaryIn this episode of the AI Engineering Podcast, host Tobias Macey sits down with Ben Wilde, Head of Innovation at Georgian, to explore the transformative impact of agentic AI on business operations and the SaaS industry. From his early days working with vintage AI systems to his current focus on product strategy and innovation in AI, Ben shares his expertise on what he calls the "continuum" of agentic AI - from simple function calls to complex autonomous systems. Join them as they discuss the challenges and opportunities of integrating agentic AI into business systems, including organizational alignment, technical competence, and the need for standardization. They also dive into emerging protocols and the evolving landscape of AI-driven products and services, including usage-based pricing models and advancements in AI infrastructure and reliability.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Ben Wilde about the impact of agentic AI on business operations and SaaS as we know itInterviewIntroductionHow did you get involved in machine learning?Can you start by sharing your definition of what constitutes "agentic AI"?There have been several generations of automation for business and product use cases. In your estimation, what are the substantive differences between agentic AI and e.g. RPA (Robotic Process Automation)?How do the inherent risks and operational overhead impact the calculus of whether and where to apply agentic capabilities?For teams that are aiming for agentic capabilities, what are the stepping stones along that path?Beyond the technical capacity, there are numerous elements of organizational alignment that are required to make full use of the capabilities of agentic processes. What are some of the strategic investments that are necessary to get the whole business pointed in the same direction for adopting and benefitting from AI agents?The most recent splash in the space of agentic AI is the introduction of the Model Context Protocol, and various responses to it. What do you see as the near and medium term impact of this effort on the ecosystem of AI agents and their architecture?Software products have gone through several major evolutions since the days of CD-ROMs in the 90s. The current era has largely been oriented around the model of subscription-based software delivered via browser or mobile-based UIs over the internet. How does the pending age of AI agents upend that model?What are the most interesting, innovative, or unexpected ways that you have seen agentic AI used for business and product capabilities?What are the most interesting, unexpected, or challenging lessons that you have learned while working with businesses adopting agentic AI capabilities?When is agentic AI the wrong choice?What are the ongoing developments in agentic capabilities that you are monitoring?Contact InfoEmailLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGeorgianAgentic Platforms And ApplicationsDifferential PrivacyAgentic AILanguage ModelReasoning ModelRobotic Process AutomationOFACOpenAI Deep ResearchModel Context ProtocolGeorgian AI Adoption SurveyGoogle Agent to Agent ProtocolGraphQLTPU == Tensor Processing UnitChris LattnerCUDANeuroSymbolic AIPrologThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

May 21, 2025

1h 01m

51

Protecting AI Systems: Understanding Vulnerabilities and Attack Surfaces

SummaryIn this episode of the AI Engineering Podcast Kasimir Schulz, Director of Security Research at HiddenLayer, talks about the complexities and security challenges in AI and machine learning models. Kasimir explains the concept of shadow genes and shadow logic, which involve identifying common subgraphs within neural networks to understand model ancestry and potential vulnerabilities, and emphasizes the importance of understanding the attack surface in AI integrations, scanning models for security threats, and evolving awareness in AI security practices to mitigate risks in deploying AI systems.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Kasimir Schulz about the relationships between the various models on the market and how that information helps with selecting and protecting models for your applicationsInterviewIntroductionHow did you get involved in machine learning?Can you start by outlining the current state of the threat landscape for ML and AI systems?What are the main areas of overlap in risk profiles between prediction/classification and generative models? (primarily from an attack surface/methodology perspective)What are the significant points of divergence?What are some of the categories of potential damages that can be created through the deployment of compromised models?How does the landscape of foundation models introduce new challenges around supply chain security for organizations building with AI?You recently published your findings on the potential to inject subgraphs into model architectures that are invisible during normal operation of the model. Along with that you wrote about the subgraphs that are shared between different classes of models. What are the key learnings that you would like to highlight from that research?What action items can organizations and engineering teams take in light of that information?Platforms like HuggingFace offer numerous variations of popular models with variations around quantization, various levels of finetuning, model distillation, etc. That is obviously a benefit to knowledge sharing and ease of access, but how does that exacerbate the potential threat in the face of backdoored models?Beyond explicit backdoors in model architectures, there are numerous attack vectors to generative models in the form of prompt injection, "jailbreaking" of system prompts, etc. How does the knowledge of model ancestry help with identifying and mitigating risks from that class of threat?A common response to that threat is the introduction of model guardrails with pre- and post-filtering of prompts and responses. How can that approach help to address the potential threat of backdoored models as well?For a malicious actor that develops one of these attacks, what is the vector for introducing the compromised model into an organization?Once that model is in use, what are the possible means by which the malicious actor can detect its presence for purposes of exploitation?What are the most interesting, innovative, or unexpected ways that you have seen the information about model ancestry used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ShadowLogic/ShadowGenes?What are some of the other means by which the operation of ML and AI systems introduce attack vectors to organizations running them?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksHiddenLayerZero-Day VulnerabilityMCP Blog PostPython Pickle Object SerializationSafeTensorsDeepseekHuggingface TransformersKROP == Knowledge Return Oriented PromptingXKCD "Little Bobby Tables"OWASP Top 10 For LLMsCVE AI Systems Working GroupRefusal Vector AblationFoundation ModelShadowLogicShadowGenesBytecodeResNet == Resideual Neural NetworkYOLO == You Only Look OnceNetronBERTRoBERTAShodanCTF == Capture The FlagTitan Bedrock Image GeneratorThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

May 3, 2025

51m

50

Understanding The Operational And Organizational Challenges Of Agentic AI

SummaryIn this episode of the AI Engineering podcast Julian LaNeve, CTO of Astronomer, talks about transitioning from simple LLM applications to more complex agentic AI systems. Julian shares insights into the challenges and considerations of this evolution, emphasizing the importance of starting with simpler applications to build operational knowledge and intuition. He discusses the parallels between microservices and agentic AI, highlighting the need for careful orchestration and observability to manage complexity and ensure reliability, and explores the technical requirements for deploying AI systems, including data infrastructure, orchestration tools like Apache Airflow, and understanding the probabilistic nature of AI models.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents.Your host is Tobias Macey and today I'm interviewing Julian LaNeve about how to avoid putting the cart before the horse with AI applications. When do you move from "simple" LLM apps to agentic AI and what's the path to get there?InterviewIntroductionHow did you get involved in machine learning?How do you technically distinguish "agentic AI" (e.g., involving planning, tool use, memory) from "simpler LLM workflows" (e.g., stateless transformations, RAG)? What are the key differences in operational complexity and potential failure modes?What specific technical challenges (e.g., state management, observability, non-determinism, prompt fragility, cost explosion) are often underestimated when teams jump directly into building stateful, autonomous agents?What are the pre-requisites from a data and infrastructure perspective before going to production with agentic applications?How does that differ from the chat-based systems that companies might be experimenting with?Technically, where do you most often see ambitious agent projects break down during development or early deployment?Beyond generic data quality, what specific data engineering practices become critical when building reliable LLM applications? (e.g., Designing data pipelines for efficient RAG chunking/embedding, versioning prompts alongside data, caching strategies for LLM calls, managing vector database ETL).From an implementation complexity standpoint, what characterizes tasks well-suited for initial LLM workflow adoption versus those genuinely requiring agentic capabilities?Can you share examples (anonymized if necessary) highlighting how organizations successfully engineered these simpler LLM workflows? What specific technical designs, tooling choices, or MLOps practices were key to their reliability and scalability?What are some hard-won technical or operational lessons from deploying and scaling LLM workflows in production environments? Any surprising performance bottlenecks, cost issues, or monitoring challenges engineers should anticipate?What technical maturity signals (e.g., robust CI/CD for ML, established monitoring/alerting for pipelines, automated evaluation frameworks, cost tracking mechanisms) suggest an engineering team might be ready to tackle the challenges of building and operating agentic systems?How does the technical stack and engineering process need to evolve when moving from orchestrated LLM workflows towards more complex agents involving memory, planning, and dynamic tool use? What new components and failure modes must be engineered for?How do you foresee orchestration platforms evolving to better serve the needs of AI engineers building LLM apps? What are the most interesting, innovative, or unexpected ways that you have seen organizations build toward advanced AI use cases?What are the most interesting, unexpected, or challenging lessons that you have learned while working on supporting AI services?When is AI the wrong choice?What is the single most critical piece of engineering advice you would give to fellow AI engineers who are tasked with integrating LLMs into production systems right now?Contact InfoLinkedInGitHubParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?LinksAstronomerAirflowAnthropicBuilding Effective Agents post from AnthropicAirflow 3.0MicroservicesPydantic AILangchainLlamaIndexLLM As A JudgeSWE (SoftWare Engineer) BenchCursorWindsurfOpenTelemetryDAG == Directed Acyclic GraphHalting ProblemAI Long Term MemoryThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Apr 21, 2025

1h 12m

49

The Power of Community in AI Development with Oumi

SummaryIn this episode of the AI Engineering Podcast Emmanouil (Manos) Koukoumidis, CEO of Oumi, about his vision for an open platform for building, evaluating, and deploying AI foundation models. Manos shares his journey from working on natural language AI services at Google Cloud to founding Oumi with a mission to advance open-source AI, emphasizing the importance of community collaboration and accessibility. He discusses the need for open-source models that are not constrained by proprietary APIs, highlights the role of Oumi in facilitating open collaboration, and touches on the complexities of model development, open data, and community-driven advancements in AI. He also explains how Oumi can be used throughout the entire lifecycle of AI model development, post-training, and deployment.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Manos Koukoumidis about Oumi, an all-in-one production-ready open platform to build, evaluate, and deploy AI modelsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Oumi is and the story behind it?There are numerous projects, both full suites and point solutions, focused on every aspect of "AI" development. What is the unique value that Oumi provides in this ecosystem?You have stated the desire for Oumi to become the Linux of AI development. That is an ambitious goal and one that Linux itself didn't start with. What do you see as the biggest challenges that need addressing to reach a critical mass of adoption?In the vein of "open source" AI, the most notable project that I'm aware of that fits the proper definition is the OLMO models from AI2. What lessons have you learned from their efforts that influence the ways that you think about your work on Oumi?On the community building front, HuggingFace has been the main player. What do you see as the benefits and shortcomings of that platform in the context of your vision for open and collaborative AI?Can you describe the overall design and architecture of Oumi?How did you approach the selection process for the different components that you are building on top of?What are the extension points that you have incorporated to allow for customization/evolution?Some of the biggest barriers to entry for building foundation models are the cost and availability of hardware used for training, and the ability to collect and curate the data needed. How does Oumi help with addressing those challenges?For someone who wants to build or contribute to an open source model, what does that process look like?How do you envision the community building/collaboration process?Your overall goal is to build a foundation for the growth and well-being of truly open AI. How are you thinking about the sustainability of the project and the funding needed to grow and support the community?What are the most interesting, innovative, or unexpected ways that you have seen Oumi used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Oumi?When is Oumi the wrong choice?What do you have planned for the future of Oumi?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksOumiCloud PaLMGoogle GeminiDeepMindLSTM == Long Short-Term MemoryTransfomers)ChatGPTPartial Differential EquationOLMOOSI AI definitionMLFlowMetaflowSkyPilotLlamaRAGPodcast EpisodeSynthetic DataPodcast EpisodeLLM As JudgeSGLangvLLMFunction Calling LeaderboardDeepseekThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Mar 16, 2025

56m

48

Arch Gateway: Add AI To Your Apps Without Custom Development

SummaryIn this episode of the AI Engineering Podcast Adil Hafiz talks about the Arch project, a gateway designed to simplify the integration of AI agents into business systems. He discusses how the gateway uses Rust and Envoy to provide a unified interface for handling prompts and integrating large language models (LLMs), allowing developers to focus on core business logic rather than AI complexities. The conversation also touches on the target audience, challenges, and future directions for the project, including plans to develop a leading planning LLM and enhance agent interoperability.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Adil Hafeez about the Arch project, a gateway for your AI agentsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Arch is and the story behind it?How do you think about the target audience for Arch and the types of problems/projects that they are responsible for?The general category of LLM gateways is largely oriented toward abstracting the specific model provider being called. What are the areas of overlap and differentiation in Arch?Many of the features in Arch are also available in AI frameworks (e.g. LangChain, LlamaIndex, etc.), such as request routing, guardrails, and tool calling. How do you think about the architectural tradeoffs of having that functionality in a gateway service?What is the workflow for someone building an application with Arch?Can you describe the architecture and components of the Arch gateway?With the pace of change in the AI/LLM ecosystem, how have you designed the Arch project to allow for rapid evolution and extensibility?What are the most interesting, innovative, or unexpected ways that you have seen Arch used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Arch?When is Arch the wrong choice?What do you have planned for the future of Arch?Contact InfoLinkedInGitHubParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksArch GatewayGradient BoostingEnvoyLLM GatewayHuggingfaceKatanemo ModelsQwen2.5Rust ClippyThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 26, 2025

31m

47

The Role Of Synthetic Data In Building Better AI Applications

SummaryIn this episode of the AI Engineering Podcast Ali Golshan, co-founder and CEO of Gretel.ai, talks about the transformative role of synthetic data in AI systems. Ali explains how synthetic data can be purpose-built for AI use cases, emphasizing privacy, quality, and structural stability. He highlights the shift from traditional methods to using language models, which offer enhanced capabilities in understanding data's deep structure and generating high-quality datasets. The conversation explores the challenges and techniques of integrating synthetic data into AI systems, particularly in production environments, and concludes with insights into the future of synthetic data, including its application in various industries, the importance of privacy regulations, and the ongoing evolution of AI systems.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents.Your host is Tobias Macey and today I'm interviewing Ali Golshan about the role of synthetic data in building, scaling, and improving AI systemsInterviewIntroductionHow did you get involved in machine learning?Can you start by summarizing what you mean by synthetic data in the context of this conversation?How have the capabilities around the generation and integration of synthetic data changed across the pre- and post-LLM timelines?What are the motivating factors that would lead a team or organization to invest in synthetic data generation capacity?What are the main methods used for generation of synthetic data sets?How does that differ across open-source and commercial offerings?From a surface level it seems like synthetic data generation is a straight-forward exercise that can be owned by an engineering team. What are the main "gotchas" that crop up as you move along the adoption curve?What are the scaling characteristics of synthetic data generation as you go from prototype to production scale?domains/data types that are inappropriate for synthetic use cases (e.g. scientific or educational content)managing appropriate distribution of values in the generation processBeyond just producing large volumes of semi-random data (structured or otherwise), what are the other processes involved in the workflow of synthetic data and its integration into the different systems that consume it?What are the most interesting, innovative, or unexpected ways that you have seen synthetic data generation used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on synthetic data generation?When is synthetic data the wrong choice?What do you have planned for the future of synthetic data capabilities at Gretel?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGretelHadoopLSTM == Long Short-Term MemoryGAN == Generative Adversarial NetworkTextbooks are all you need MSFT paperIlluminaThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 16, 2025

54m

46

Optimize Your AI Applications Automatically With The TensorZero LLM Gateway

SummaryIn this episode of the AI Engineering podcast Viraj Mehta, CTO and co-founder of TensorZero, talks about the use of LLM gateways for managing interactions between client-side applications and various AI models. He highlights the benefits of using such a gateway, including standardized communication, credential management, and potential features like request-response caching and audit logging. The conversation also explores TensorZero's architecture and functionality in optimizing AI applications by managing structured data inputs and outputs, as well as the challenges and opportunities in automating prompt generation and maintaining interaction history for optimization purposes.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents. Your host is Tobias Macey and today I'm interviewing Viraj Mehta about the purpose of an LLM gateway and his work on TensorZeroInterviewIntroductionHow did you get involved in machine learning?What is an LLM gateway?What purpose does it serve in an AI application architecture?What are some of the different features and capabilities that an LLM gateway might be expected to provide?Can you describe what TensorZero is and the story behind it?What are the core problems that you are trying to address with Tensor0 and for whom?One of the core features that you are offering is management of interaction history. How does this compare to the "memory" functionality offered by e.g. LangChain, Cognee, Mem0, etc.?How does the presence of TensorZero in an application architecture change the ways that an AI engineer might approach the logic and control flows in a chat-based or agent-oriented project?Can you describe the workflow of building with Tensor0 and some specific examples of how it feeds back into the performance/behavior of an LLM?What are some of the ways in which the addition of Tensor0 or another LLM gateway might have a negative effect on the design or operation of an AI application?What are the most interesting, innovative, or unexpected ways that you have seen TensorZero used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on TensorZero?When is TensorZero the wrong choice?What do you have planned for the future of TensorZero?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksTensorZeroLLM GatewayLiteLLMOpenAIGoogle VertexAnthropicReinforcement LearningTokamak ReactorViraj RLHF PaperContextual Dueling BanditsDirect Preference OptimizationPartially Observable Markov Decision ProcessDSPyPyTorchCogneeMem0LangGraphDouglas HofstadterOpenAI GymOpenAI o1OpenAI o3Chain Of ThoughtThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jan 22, 2025

1h 03m

45

Harnessing The Engine Of AI

SummaryIn this episode of the AI Engineering Podcast Ron Green, co-founder and CTO of KungFu AI, talks about the evolving landscape of AI systems and the challenges of harnessing generative AI engines. Ron shares his insights on the limitations of large language models (LLMs) as standalone solutions and emphasizes the need for human oversight, multi-agent systems, and robust data management to support AI initiatives. He discusses the potential of domain-specific AI solutions, RAG approaches, and mixture of experts to enhance AI capabilities while addressing risks. The conversation also explores the evolving AI ecosystem, including tooling and frameworks, strategic planning, and the importance of interpretability and control in AI systems. Ron expresses optimism about the future of AI, predicting significant advancements in the next 20 years and the integration of AI capabilities into everyday software applications.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents. Your host is Tobias Macey and today I'm interviewing Ron Green about the wheels that we need for harnessing the power of the generative AI engineInterviewIntroductionHow did you get involved in machine learning?Can you describe what you see as the main shortcomings of LLMs as a stand-alone solution (to anything)?The most established vehicle for harnessing LLM capabilities is the RAG pattern. What are the main limitations of that as a "product" solution?The idea of multi-agent or mixture-of-experts systems is a more sophisticated approach that is gaining some attention. What do you see as the pro/con conversation around that pattern?Beyond the system patterns that are being developed there is also a rapidly shifting ecosystem of frameworks, tools, and point solutions that plugin to various points of the AI lifecycle. How does that volatility hinder the adoption of generative AI in different contexts?In addition to the tooling, the models themselves are rapidly changing. How much does that influence the ways that organizations are thinking about whether and when to test the waters of AI?Continuing on the metaphor of LLMs and engines and the need for vehicles, where are we on the timeline in relation to the model T Ford?What are the vehicle categories that we still need to design and develop? (e.g. sedans, mini-vans, freight trucks, etc.)The current transformer architecture is starting to reach scaling limits that lead to diminishing returns. Given your perspective as an industry veteran, what are your thoughts on the future trajectory of AI model architectures?What is the ongoing role of regression style ML in the landscape of generative AI?What are the most interesting, innovative, or unexpected ways that you have seen LLMs used to power a "vehicle"?What are the most interesting, unexpected, or challenging lessons that you have learned while working in this phase of AI?When is generative AI/LLMs the wrong choice?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksKungfu.aiLlama open generative AI modelsChatGPTCopilotCursorRAG == Retrieval Augmented GenerationPodcast EpisodeMixture of ExpertsDeep LearningRandom ForestSupervised LearningActive Learning)Yann LeCunnRLHF == Reinforcement Learning from Human FeedbackModel T FordMamba selective state spaceLiquid NetworkChain of thoughtOpenAI o1Marvin MinskyVon Neumann ArchitectureAttention Is All You NeedMultilayer PerceptronDot ProductDiffusion ModelGaussian NoiseAlphaFold 3AnthropicSparse AutoencoderThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Dec 16, 2024

55m

44

The Complex World of Generative AI Governance

SummaryIn this episode of the AI Engineering Podcast Jim Olsen, CTO of ModelOp, talks about the governance of generative AI models and applications. Jim shares his extensive experience in software engineering and machine learning, highlighting the importance of governance in high-risk applications like healthcare. He explains that governance is more about the use cases of AI models rather than the models themselves, emphasizing the need for proper inventory and monitoring to ensure compliance and mitigate risks. The conversation covers challenges organizations face in implementing AI governance policies, the importance of technical controls for data governance, and the need for ongoing monitoring and baselines to detect issues like PII disclosure and model drift. Jim also discusses the balance between innovation and regulation, particularly with evolving regulations like those in the EU, and provides valuable perspectives on the current state of AI governance and the need for robust model lifecycle management.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Jim Olsen about governance of your generative AI models and applicationsInterviewIntroductionHow did you get involved in machine learning?Can you describe what governance means in the context of generative AI models? (e.g. governing the models, their applications, their outputs, etc.)Governance is typically a hybrid endeavor of technical and organizational policy creation and enforcement. From the organizational perspective, what are some of the difficulties that teams are facing in understanding what those policies need to encompass?How much familiarity with the capabilities and limitations of the models is necessary to engage productively with policy debates?The regulatory landscape around AI is still very nascent. Can you give an overview of the current state of legal burden related to AI?What are some of the regulations that you consider necessary but as-of-yet absent?Data governance as a practice typically relates to controls over who can access what information and how it can be used. The controls for those policies are generally available in the data warehouse, business intelligence, etc. What are the different dimensions of technical controls that are needed in the application of generative AI systems?How much of the controls that are present for governance of analytical systems are applicable to the generative AI arena?What are the elements of risk that change when considering internal vs. consumer facing applications of generative AI?How do the modalities of the AI models impact the types of risk that are involved? (e.g. language vs. vision vs. audio)What are some of the technical aspects of the AI tools ecosystem that are in greatest need of investment to ease the burden of risk and validation of model use?What are the most interesting, innovative, or unexpected ways that you have seen AI governance implemented?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI governance?What are the technical, social, and organizational trends of AI risk and governance that you are monitoring?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksModelOpFoundation ModelsGDPREU AI RegulationLlama 2AWS BedrockShadow ITRAG == Retrieval Augmented GenerationPodcast EpisodeNvidia NEMOLangChainShapley ValuesGibberish DetectionThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Dec 1, 2024

54m

43

Building Semantic Memory for AI With Cognee

SummaryIn this episode of the AI Engineering Podcast, Vasilije Markovich talks about enhancing Large Language Models (LLMs) with memory to improve their accuracy. He discusses the concept of memory in LLMs, which involves managing context windows to enhance reasoning without the high costs of traditional training methods. He explains the challenges of forgetting in LLMs due to context window limitations and introduces the idea of hierarchical memory, where immediate retrieval and long-term information storage are balanced to improve application performance. Vasilije also shares his work on Cognee, a tool he's developing to manage semantic memory in AI systems, and discusses its potential applications beyond its core use case. He emphasizes the importance of combining cognitive science principles with data engineering to push the boundaries of AI capabilities and shares his vision for the future of AI systems, highlighting the role of personalization and the ongoing development of Cognee to support evolving AI architectures.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Vasilije Markovic about adding memory to LLMs to improve their accuracyInterviewIntroductionHow did you get involved in machine learning?Can you describe what "memory" is in the context of LLM systems?What are the symptoms of "forgetting" that manifest when interacting with LLMs?How do these issues manifest between single-turn vs. multi-turn interactions?How does the lack of hierarchical and evolving memory limit the capabilities of LLM systems?What are the technical/architectural requirements to add memory to an LLM system/application?How does Cognee help to address the shortcomings of current LLM/RAG architectures?Can you describe how Cognee is implemented?Recognizing that it has only existed for a short time, how have the design and scope of Cognee evolved since you first started working on it?What are the data structures that are most useful for managing the memory structures?For someone who wants to incorporate Cognee into their LLM architecture, what is involved in integrating it into their applications?How does it change the way that you think about the overall requirements for an LLM application?For systems that interact with multiple LLMs, how does Cognee manage context across those systems? (e.g. different agents for different use cases)There are other systems that are being built to manage user personalization in LLm applications, how do the goals of Cognee relate to those use cases? (e.g. Mem0 - https://github.com/mem0ai/mem0)What are the unknowns that you are still navigating with Cognee?What are the most interesting, innovative, or unexpected ways that you have seen Cognee used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cognee?When is Cognee the wrong choice?What do you have planned for the future of Cognee?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksCogneeMontenegroCatastrophic ForgettingMulti-Turn InteractionRAG == Retrieval Augmented GenerationPodcast EpisodeGraphRAGPodcast EpisodeLong-term memoryShort-term memoryLangchainLlamaIndexHaystackdltData Engineering Podcast EpisodePineconePodcast EpisodeAgentic RAGAirflowDAG == Directed Acyclic GraphFalkorDBNeo4JPydanticAWS ECSAWS SNSAWS SQSAWS LambdaLLM As JudgeMem0QDrantLanceDBDuckDBThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 25, 2024

55m

42

The Impact of Generative AI on Software Development

SummaryIn this episode of the AI Engineering Podcast, Tanner Burson, VP of Engineering at Prismatic, talks about the evolving impact of generative AI on software developers. Tanner shares his insights from engineering leadership and data engineering initiatives, discussing how AI is blurring the lines of developer roles and the strategic value of AI in software development. He explores the current landscape of AI tools, such as GitHub's Copilot, and their influence on productivity and workflow, while also touching on the challenges and opportunities presented by AI in code generation, review, and tooling. Tanner emphasizes the need for human oversight to maintain code quality and security, and offers his thoughts on the future of AI in development, the importance of balancing innovation with practicality, and the evolving role of engineers in an AI-driven landscape.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Tanner Burson about the impact of generative AI on software developersInterviewIntroductionHow did you get involved in machine learning?Can you describe what types of roles and work you consider encompassed by the term "developers" for the purpose of this conversation?How does your work at Prismatic give you visibility and insight into the effects of AI on developers and their work?There have been many competing narratives about AI and how much of the software development process it is capable of encompassing. What is your top-level view on what the long-term impact on the job prospects of software developers will be as a result of generative AI?There are many obvious examples of utilities powered by generative AI that are focused on software development. What do you see as the categories or specific tools that are most impactful to the development cycle?In what ways do you find familiarity with/understanding of LLM internals useful when applying them to development processes?As an engineering leader, how are you evaluating and guiding your team on the use of AI powered tools?What are some of the risks that you are guarding against as a result of AI in the development process?What are the most interesting, innovative, or unexpected ways that you have seen AI used in the development process?What are the most interesting, unexpected, or challenging lessons that you have learned while using AI for software development?When is AI the wrong choice for a developer?What are your projections for the near to medium term impact on the developer experience as a result of generative AI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksPrismaticGoogle AI Development announcementTabninePodcast EpisodeGitHub CopilotPlandexOpenAI APIAmazon QOllamaHuggingface TransformersAnthropicLangchainLlamaindexHaystackLlama 3.2Qwen2.5-CoderThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 22, 2024

52m

41

ML Infrastructure Without The Ops: Simplifying The ML Developer Experience With Runhouse

SummaryMachine learning workflows have long been complex and difficult to operationalize. They are often characterized by a period of research, resulting in an artifact that gets passed to another engineer or team to prepare for running in production. The MLOps category of tools have tried to build a new set of utilities to reduce that friction, but have instead introduced a new barrier at the team and organizational level. Donny Greenberg took the lessons that he learned on the PyTorch team at Meta and created Runhouse. In this episode he explains how, by reducing the number of opinions in the framework, he has also reduced the complexity of moving from development to production for ML systems.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Donny Greenberg about Runhouse and the current state of ML infrastructureInterviewIntroductionHow did you get involved in machine learning?What are the core elements of infrastructure for ML and AI?How has that changed over the past ~5 years?For the past few years the MLOps and data engineering stacks were built and managed separately. How does the current generation of tools and product requirements influence the present and future approach to those domains?There are numerous projects that aim to bridge the complexity gap in running Python and ML code from your laptop up to distributed compute on clouds (e.g. Ray, Metaflow, Dask, Modin, etc.). How do you view the decision process for teams trying to understand which tool(s) to use for managing their ML/AI developer experience?Can you describe what Runhouse is and the story behind it?What are the core problems that you are working to solve?What are the main personas that you are focusing on? (e.g. data scientists, DevOps, data engineers, etc.)How does Runhouse factor into collaboration across skill sets and teams?Can you describe how Runhouse is implemented?How has the focus on developer experience informed the way that you think about the features and interfaces that you include in Runhouse?How do you think about the role of Runhouse in the integration with the AI/ML and data ecosystem?What does the workflow look like for someone building with Runhouse?What is involved in managing the coordination of compute and data locality to reduce networking costs and latencies?What are the most interesting, innovative, or unexpected ways that you have seen Runhouse used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Runhouse?When is Runhouse the wrong choice?What do you have planned for the future of Runhouse?What is your vision for the future of infrastructure and developer experience in ML/AI?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksRunhouseGitHubPyTorchPodcast.__init__ EpisodeKubernetesBin PackingLinear RegressionGradient Boosted Decision TreeDeep LearningTransformer Architecture)SlurmSagemakerVertex AIMetaflowPodcast.__init__ EpisodeMLFlowDaskData Engineering Podcast EpisodeRayPodcast.__init__ EpisodeSparkDatabricksSnowflakeArgoCDPyTorch DistributedHorovodLlama.cppPrefectData Engineering Podcast EpisodeAirflowOOM == Out of MemoryWeights and BiasesKNativeBERT language modelThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 11, 2024

1h 16m

40

Building AI Systems on Postgres: An Inside Look at pgai Vectorizer

SummaryWith the growth of vector data as a core element of any AI application comes the need to keep those vectors up to date. When you go beyond prototypes and into production you will need a way to continue experimenting with new embedding models, chunking strategies, etc. You will also need a way to keep the embeddings up to date as your data changes. The team at Timescale created the pgai Vectorizer toolchain to let you manage that work in your Postgres database. In this episode Avthar Sewrathan explains how it works and how you can start using it today.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Avthar Sewrathan about the pgai extension for Postgres and how to run your AI workflows in your databaseInterviewIntroductionHow did you get involved in machine learning?Can you describe what pgai Vectorizer is and the story behind it?What are the benefits of using the database engine to execute AI workflows?What types of operations does pgai Vectorizer enable?What are some common generative AI patterns that can't be done with pgai?AI applications require a large and complex set of dependencies. How does that work with pgai Vectorizer and the Python runtime in Postgres?What are some of the other challenges or system pressures that are introduced by running these AI workflows in the database context?Can you describe how the pgai extension is implemented?With the rapid pace of change in the AI ecosystem, how has that informed the set of features that make sense in pgai Vectorizer and won't require rebuilding in 6 months?Can you describe the workflow of using pgai Vectorizer to build and maintain a set of embeddings in their database?How can pgai Vectorizer help with the situation of migrating to a new embedding model and having to reindex all of the content?How do you think about the developer experience for people who are working with pgai Vectorizer, as compared to using e.g. LangChain, LlamaIndex, etc.?What are the most interesting, innovative, or unexpected ways that you have seen pgai Vectorizer used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on pgai Vectorizer?When is pgai Vectorizer the wrong choice?What do you have planned for the future of pgai Vectorizer?Contact InfoLinkedInWebsiteParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksTimescalepgaiTransformer architecture for deep learningNeural NetworkspgvectorpgvectorscaleModalRAG == Retrieval Augmented GenerationSemantic SearchOllamaGraphRAGagensgraphLangChainLlamaIndexHaystackIVFFlatHNSWDiskANNRepl.it AgentBM25TSVectorParadeDBThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Nov 11, 2024

53m

39

Running Generative AI Models In Production

SummaryIn this episode Philip Kiely from BaseTen talks about the intricacies of running open models in production. Philip shares his journey into AI and ML engineering, highlighting the importance of understanding product-level requirements and selecting the right model for deployment. The conversation covers the operational aspects of deploying AI models, including model evaluation, compound AI, and model serving frameworks such as TensorFlow Serving and AWS SageMaker. Philip also discusses the challenges of model quantization, rapid model evolution, and monitoring and observability in AI systems, offering valuable insights into the future trends in AI, including local inference and the competition between open source and proprietary models.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Philip Kiely about running open models in productionInterviewIntroductionHow did you get involved in machine learning?Can you start by giving an overview of the major decisions to be made when planning the deployment of a generative AI model?How does the model selected in the beginning of the process influence the downstream choices?In terms of application architecture, the major patterns that I've seen are RAG, fine-tuning, multi-agent, or large model. What are the most common methods that you see? (and any that I failed to mention)How have the rapid succession of model generations impacted the ways that teams think about their overall application? (capabilities, features, architecture, etc.)In terms of model serving, I know that Baseten created Truss. What are some of the other notable options that teams are building with?What is the role of the serving framework in the context of the application?There are also a large number of inference engines that have been released. What are the major players in that arena?What are the features and capabilities that they are each basing their competitive advantage on?For someone who is new to AI Engineering, what are some heuristics that you would recommend when choosing an inference engine?Once a model (or set of models) is in production and serving traffic it's necessary to have visibility into how it is performing. What are the key metrics that are necessary to monitor for generative AI systems?In the event that one (or more) metrics are trending negatively, what are the levers that teams can pull to improve them?When running models constructed with e.g. linear regression or deep learning there was a common issue with "concept drift". How does that manifest in the context of large language models, particularly when coupled with performance optimization?What are the most interesting, innovative, or unexpected ways that you have seen teams manage the serving of open gen AI models?What are the most interesting, unexpected, or challenging lessons that you have learned while working with generative AI model serving?When is Baseten the wrong choice?What are the future trends and technology investments that you are focused on in the space of AI model serving?Contact InfoLinkedInTwitterParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksBasetenPodcast EpisodeCopyleftLlama ModelsNomicOlmoAllen Institute for AIPlayground 2The Peace Dividend Of The SaaS WarsVercelNetlifyRAG == Retrieval Augmented GenerationPodcast EpisodeCompound AILangchainOutlines Structured output for AI systemsTrussChainsLlamaindexRayMLFlowCog (Replicate) containers for MLBentoMLDjangoWSGIuWSGIGunicornZapiervLLMTensorRT-LLMTensorRTQuantizationLoRA Low Rank Adaptation of Large Language ModelsPruningDistillationGrafanaSpeculative DecodingGroqRunpodLambda LabsThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 28, 2024

57m

38

Enhancing AI Retrieval with Knowledge Graphs: A Deep Dive into GraphRAG

SummaryIn this episode of the AI Engineering podcast, Philip Rathle, CTO of Neo4J, talks about the intersection of knowledge graphs and AI retrieval systems, specifically Retrieval Augmented Generation (RAG). He delves into GraphRAG, a novel approach that combines knowledge graphs with vector-based similarity search to enhance generative AI models. Philip explains how GraphRAG works by integrating a graph database for structured data storage, providing more accurate and explainable AI responses, and addressing limitations of traditional retrieval systems. The conversation covers technical aspects such as data modeling, entity extraction, and ontology use cases, as well as the infrastructure and workflow required to support GraphRAG, setting the stage for innovative applications across various industries.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Philip Rathle about the application of knowledge graphs in AI retrieval systemsInterviewIntroductionHow did you get involved in machine learning?Can you describe what GraphRAG is?What are the capabilities that graph structures offer beyond vector/similarity-based retrieval methods of prompting?What are some examples of the ways that semantic limitations of nearest-neighbor vector retrieval fail to provide relevant results?What are the technical requirements to implement graph-augmented retrieval?What are the concrete ways in which the embedding and retrieval steps of a typical RAG pipeline need to be modified to account for the addition of the graph?Many tutorials for building vector-based knowledge repositories skip over considerations around data modeling. For building a graph-based knowledge repository there obviously needs to be a bit more work put in. What are the key design choices that need to be made for implementing the graph for an AI application?How does the selection of the ontology/taxonomy impact the performance and capabilities of the resulting application?Building a fully functional knowledge graph can be a significant undertaking on its own. How can LLMs and AI models help with the construction and maintenance of that knowledge repository?What are some of the validation methods that should be brought to bear to ensure that the resulting graph properly represents the knowledge domain that you are trying to model?Vector embedding and retrieval are a core building block for a majority of AI application frameworks. How much support do you see for GraphRAG in the ecosystem?For the case where someone is using a framework that does not explicitly implement GraphRAG techniques, what are some of the implementation strategies that you have seen be most effective for adding that functionality?What are some of the ways that the combination of vector search and knowledge graphs are useful independent of their combination with language models?What are the most interesting, innovative, or unexpected ways that you have seen GraphRAG used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on GraphRAG applications?When is GraphRAG the wrong choice?What are the opportunities for improvement in the design and implementation of graph-based retrieval systems?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksNeo4JGraphRAG ManifestoRAG == Retrieval Augmented GenerationPodcast EpisodeVLDB == Very Large DataBasesKnowledge GraphNearest Neighbor SearchPageRankThings Not Strings) Google Knowledge Graph PaperpgvectorPineconeData Engineering Podcast EpisodeTables To LabelsNLP == Natural Language ProcessingOntologyLangChainLlamaIndexRLHF == Reinforcement Learning with Human FeedbackSenzingNeoConverseCypher query languageGQL query standardAWS BedrockVertex AISequoia Training Data - Klarna episodeOuroborosThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 10, 2024

59m

37

Harnessing Generative AI for Effective Digital Advertising Campaigns

SummaryIn this episode of the AI Engineering podcast Praveen Gujar, Director of Product at LinkedIn, talks about the applications of generative AI in digital advertising. He highlights the key areas of digital advertising, including audience targeting, content creation, and ROI measurement, and delves into how generative AI is revolutionizing these aspects. Praveen shares successful case studies of generative AI in digital advertising, including campaigns by Heinz, the Barbie movie, and Maggi, and discusses the potential pitfalls and risks associated with AI-powered tools. He concludes with insights into the future of generative AI in digital advertising, highlighting the importance of cultural transformation and the synergy between human creativity and AI.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Praveen Gujar about the applications of generative AI in digital advertisingInterviewIntroductionHow did you get involved in machine learning?Can you start by defining "digital advertising" for the scope of this conversation?What are the key elements/characteristics/goals of digital avertising?In the world before generative AI, what did a typical end-to-end advertising campaign workflow look like?What are the stages of that workflow where generative AI are proving to be most useful?How do the current limitations of generative AI (e.g. hallucinations, non-determinism) impact the ways in which they can be used?What are the technological and organizational systems that need to be implemented to effectively apply generative AI in public-facing applications that are so closely tied to brand/company image?What are the elements of user education/expectation setting that are necessary when working with marketing/advertising personnel to help avoid damage to the brands?What are some examples of applications for generative AI in digital advertising that have gone well?Any that have gone wrong?What are the most interesting, innovative, or unexpected ways that you have seen generative AI used in digital advertising?What are the most interesting, unexpected, or challenging lessons that you have learned while working on digital advertising applications of generative AI?When is generative AI the wrong choice?What are your future predictions for the use of generative AI in dgital advertising?Contact InfoWebsiteLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGenerative AILLM == Large Language ModelDall-E)RLHF == Reinforcement Learning fHuman FeedbackThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 2, 2024

41m

36

Building Scalable ML Systems on Kubernetes

SummaryIn this episode of the AI Engineering podcast, host Tobias Macy interviews Tammer Saleh, founder of SuperOrbital, about the potentials and pitfalls of using Kubernetes for machine learning workloads. The conversation delves into the specific needs of machine learning workflows, such as model tracking, versioning, and the use of Jupyter Notebooks, and how Kubernetes can support these tasks. Tammer emphasizes the importance of a unified API for different teams and the flexibility Kubernetes provides in handling various workloads. Finally, Tammer offers advice for teams considering Kubernetes for their machine learning workloads and discusses the future of Kubernetes in the ML ecosystem, including areas for improvement and innovation.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Tammer Saleh about the potentials and pitfalls of using Kubernetes for your ML workloads.InterviewIntroductionHow did you get involved in Kubernetes?For someone who is unfamiliar with Kubernetes, how would you summarize it?For the context of this conversation, can you describe the different phases of ML that we're talking about?Kubernetes was originally designed to handle scaling and distribution of stateless processes. ML is an inherently stateful problem domain. What challenges does that add for K8s environments?What are the elements of an ML workflow that lend themselves well to a Kubernetes environment?How much Kubernetes knowledge does an ML/data engineer need to know to get their work done?What are the sharp edges of Kubernetes in the context of ML projects?What are the most interesting, unexpected, or challenging lessons that you have learned while working with Kubernetes?When is Kubernetes the wrong choice for ML?What are the aspects of Kubernetes (core or the ecosystem) that you are keeping an eye on which will help improve its utility for ML workloads?Contact InfoEmailLinkedInParting QuestionFrom your perspective, what is the biggest gap in the tooling or technology for ML workloads today?LinksSuperOrbitalCloudFoundryHeroku12 Factor ModelKubernetesDocker ComposeCore K8s ClassJupyter NotebookCrossplaneOchre JellyCNCF (Cloud Native Computing Foundation) LandscapeStateful SetRAG == Retrieval Augmented GenerationPodcast EpisodeKubeflowFlyteData Engineering Podcast EpisodePachydermData Engineering Podcast EpisodeCoreWeaveKubectl ("koob-cuddle")HelmCRD == Custom Resource DefinitionHorovodPodcast.__init__ EpisodeTemporalSlurmRayDaskInfinibandThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Aug 15, 2024

50m

35

Expert Insights On Retrieval Augmented Generation And How To Build It

SummaryIn this episode we're joined by Matt Zeiler, founder and CEO of Clarifai, as he dives into the technical aspects of retrieval augmented generation (RAG). From his journey into AI at the University of Toronto to founding one of the first deep learning AI companies, Matt shares his insights on the evolution of neural networks and generative models over the last 15 years. He explains how RAG addresses issues with large language models, including data staleness and hallucinations, by providing dynamic access to information through vector databases and embedding models. Throughout the conversation, Matt and host Tobias Macy discuss everything from architectural requirements to operational considerations, as well as the practical applications of RAG in industries like intelligence, healthcare, and finance. Tune in for a comprehensive look at RAG and its future trends in AI.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Matt Zeiler, Founder & CEO of Clarifai, about the technical aspects of RAG, including the architectural requirements, edge cases, and evolutionary characteristicsInterviewIntroductionHow did you get involved in the area of data management?Can you describe what RAG (Retrieval Augmented Generation) is?What are the contexts in which you would want to use RAG?What are the alternatives to RAG?What are the architectural/technical components that are required for production grade RAG?Getting a quick proof-of-concept working for RAG is fairly straightforward. What are the failures modes/edge cases that start to surface as you scale the usage and complexity?The first step of building the corpus for RAG is to generate the embeddings. Can you talk through the planning and design process? (e.g. model selection for embeddings, storage capacity/latency, etc.)How does the modality of the input/output affect this and downstream decisions? (e.g. text vs. image vs. audio, etc.)What are the features of a vector store that are most critical for RAG?The set of available generative models is expanding and changing at breakneck speed. What are the foundational aspects that you look for in selecting which model(s) to use for the output?Vector databases have been gaining ground for search functionality, even without generative AI. What are some of the other ways that elements of RAG can be re-purposed?What are the most interesting, innovative, or unexpected ways that you have seen RAG used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on RAG?When is RAG the wrong choice?What are the main trends that you are following for RAG and its component elements going forward?Contact InfoWebsiteLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. [Podcast.__init__]() covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksClarifaiGeoff HintonYann LecunNeural NetworksDeep LearningRetrieval Augmented GenerationContext WindowVector DatabasePrompt EngineeringMistralLlama 3Embedding QuantizationActive LearningGoogle GeminiAI Model AttentionRecurrent NetworkConvolutional NetworkReranking ModelStop WordsMassive Text Embedding Benchmark (MTEB)Retool State of AI ReportpgvectorMilvusQdrantPineconeOpenLLM LeaderboardSemantic SearchHashicorpThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jul 28, 2024

1h 03m

34

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

SummaryArtificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the approach of cognitive AI. In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Peter Voss about what is involved in making your AI applications more "human"InterviewIntroductionHow did you get involved in machine learning?Can you start by unpacking the idea of "human-like" AI?How does that contrast with the conception of "AGI"?The applications and limitations of GPT/LLM models have been dominating the popular conversation around AI. How do you see that impacting the overrall ecosystem of ML/AI applications and investment?The fundamental/foundational challenge of every AI use case is sourcing appropriate data. What are the strategies that you have found useful to acquire, evaluate, and prepare data at an appropriate scale to build high quality models? What are the opportunities and limitations of causal modeling techniques for generalized AI models?As AI systems gain more sophistication there is a challenge with establishing and maintaining trust. What are the risks involved in deploying more human-level AI systems and monitoring their reliability?What are the practical/architectural methods necessary to build more cognitive AI systems?How would you characterize the ecosystem of tools/frameworks available for creating, evolving, and maintaining these applications?What are the most interesting, innovative, or unexpected ways that you have seen cognitive AI applied?What are the most interesting, unexpected, or challenging lessons that you have learned while working on desiging/developing cognitive AI systems?When is cognitive AI the wrong choice?What do you have planned for the future of cognitive AI applications at Aigo?Contact InfoLinkedInWebsiteParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksAigo.aiArtificial General IntelligenceCognitive AIKnowledge GraphCausal ModelingBayesian StatisticsThinking Fast & Slow by Daniel Kahneman (affiliate link)Agent-Based ModelingReinforcement LearningDARPA 3 Waves of AI presentationWhy Don't We Have AGI Yet? whitepaperConcepts Is All You Need WhitepaperHellen KellerStephen HawkingThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jul 28, 2024

52m

33

Build Your Second Brain One Piece At A Time

SummaryGenerative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Tsavo Knott about Pieces, a personal AI toolkit to improve the efficiency of developersInterviewIntroductionHow did you get involved in machine learning?Can you describe what Pieces is and the story behind it?The past few months have seen an endless series of personalized AI tools launched. What are the features and focus of Pieces that might encourage someone to use it over the alternatives?model selectionsarchitecture of Pieces applicationlocal vs. hybrid vs. online modelsmodel update/delivery processdata preparation/serving for models in context of Pieces appapplication of AI to developer workflowstypes of workflows that people are building with piecesWhat are the most interesting, innovative, or unexpected ways that you have seen Pieces used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pieces?When is Pieces the wrong choice?What do you have planned for the future of Pieces?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksPiecesNPU == Neural Processing UnitTensor ChipLoRA == Low Rank AdaptationGenerative Adversarial NetworksMistralEmacsVimNeoVimDartFlutterTypescriptLuaRetrieval Augmented GenerationONNXLSTM == Long Short-Term MemoryLLama 2GitHub CopilotTabninePodcast EpisodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jul 28, 2024

48m

32

Strategies For Building A Product Using LLMs At DataChat

SummaryLarge Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Jignesh Patel about working with LLMs; understanding how they work and how to build your ownInterviewIntroductionHow did you get involved in machine learning?Can you start by sharing some of the ways that you are working with LLMs currently?What are the business challenges involved in building a product on top of an LLM model that you don't own or control? In the current age of business, your data is often your strategic advantage. How do you avoid losing control of, or leaking that data while interfacing with a hosted LLM API?What are the technical difficulties related to using an LLM as a core element of a product when they are largely a black box? What are some strategies for gaining visibility into the inner workings or decision making rules for these models?What are the factors, whether technical or organizational, that might motivate you to build your own LLM for a business or product? Can you unpack what it means to "build your own" when it comes to an LLM?In your work at DataChat, how has the progression of sophistication in LLM technology impacted your own product strategy?What are the most interesting, innovative, or unexpected ways that you have seen LLMs/DataChat used?What are the most interesting, unexpected, or challenging lessons that you have learned while working with LLMs?When is an LLM the wrong choice?What do you have planned for the future of DataChat?Contact InfoWebsiteLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksDataChatCMU == Carnegie Mellon UniversitySVM == Support Vector MachineGenerative AIGenomicsProteomicsParquetOpenAI CodexLLamaMistralGoogle VertexLangchainRetrieval Augmented GenerationPrompt EngineeringEnsemble LearningXGBoostCatboostLinear RegressionCOGS == Cost Of Goods SoldBruce Schneier - AI And TrustThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Mar 3, 2024

48m

31

Improve The Success Rate Of Your Machine Learning Projects With bizML

SummaryMachine learning is a powerful set of technologies, holding the potential to dramatically transform businesses across industries. Unfortunately, the implementation of ML projects often fail to achieve their intended goals. This failure is due to a lack of collaboration and investment across technological and organizational boundaries. To help improve the success rate of machine learning projects Eric Siegel developed the six step bizML framework, outlining the process to ensure that everyone understands the whole process of ML deployment. In this episode he shares the principles and promise of that framework and his motivation for encapsulating it in his book "The AI Playbook".AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Eric Siegel about how the bizML approach can help improve the success rate of your ML projectsInterviewIntroductionHow did you get involved in machine learning?Can you describe what bizML is and the story behind it? What are the key aspects of this approach that are different from the "industry standard" lifecycle of an ML project?What are the elements of your personal experience as an ML consultant that helped you develop the tenets of bizML?Who are the personas that need to be involved in an ML project to increase the likelihood of success? Who do you find to be best suited to "own" or "lead" the process?What are the organizational patterns that might hinder the work of delivering on the goals of an ML initiative?What are some of the misconceptions about the work involved in/capabilities of an ML model that you commonly encounter?What is your main goal in writing your book "The AI Playbook"?What are the most interesting, innovative, or unexpected ways that you have seen the bizML process in action?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ML projects and developing the bizML framework?When is bizML the wrong choice?What are the future developments in organizational and technical approaches to ML that will improve the success rate of AI projects?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksThe AI Playbook: Mastering the Rare Art of Machine Learning Deployment by Eric SiegelPredictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric SiegelColumbia UniversityMachine Learning Week ConferenceGenerative AI WorldMachine Learning Leadership and Practice CourseRexer AnalyticsKD NuggetsCRISP-DMRandom ForestGradient DescentThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 18, 2024

50m

30

Using Generative AI To Accelerate Feature Engineering At FeatureByte

SummaryOne of the most time consuming aspects of building a machine learning model is feature engineering. Generative AI offers the possibility of accelerating the discovery and creation of feature pipelines. In this episode Colin Priest explains how FeatureByte is applying generative AI models to the challenge of building and maintaining machine learning pipelines.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Colin Priest about applying generative AI to the task of building and deploying AI pipelinesInterviewIntroductionHow did you get involved in machine learning?Can you start by giving the 30,000 foot view of the steps involved in an AI pipeline? Understand the problemFeature ideationFeature engineeringExperimentOptimizeProductionizeWhat are the stages of that process that are prone to repetition? What are the ways that teams typically try to automate those steps?What are the features of generative AI models that can be brought to bear on the design stage of an AI pipeline? What are the validation/verification processes that engineers need to apply to the generated suggestions?What are the opportunities/limitations for unit/integration style tests?What are the elements of developer experience that need to be addressed to make the gen AI capabilities an enhancement instead of a distraction? What are the interfaces through which the AI functionality can/should be exposed?What are the aspects of pipeline and model deployment that can benefit from generative AI functionality? What are the potential risk factors that need to be considered when evaluating the application of this functionality?What are the most interesting, innovative, or unexpected ways that you have seen generative AI used in the development and maintenance of AI pipelines?What are the most interesting, unexpected, or challenging lessons that you have learned while working on the application of generative AI to the ML workflow?When is generative AI the wrong choice?What do you have planned for the future of FeatureByte's AI copilot capabiliteis?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksFeatureByteGenerative AIThe Art of WarOCR == Optical Character RecognitionGenetic AlgorithmSemantic LayerPrompt EngineeringThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0Support The Machine Learning Podcast

Feb 11, 2024

44m

Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty

From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit

Taming Voice Complexity with Dynamic Ensembles at Modulate

GPU Clouds, Aggregators, and the New Economics of AI Compute

The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI

Generative AI Meets Accessibility: Benchmarks, Breakthroughs, and Blind Spots with Joe Devon

Beyond the Chatbot: Practical Frameworks for Agentic Capabilities in SaaS

MCP as the API for AI‑Native Systems: Security, Orchestration, and Scale

Context as Code, DevX as Leverage: Accelerating Software with Multi‑Agent Workflows

Inside the Black Box: Neuron-Level Control and Safer LLMs

Building the Internet of Agents: Identity, Observability, and Open Protocols

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

From MRI to World Models: How AI Is Changing What We See

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

Building Production-Ready AI Agents with Pydantic AI

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

Right-Sizing AI: Small Language Models for Real-World Production

AI Agents and Identity Management

Revolutionizing Production Systems: The Resolve AI Approach

Designing Scalable AI Systems with FastMCP: Challenges and Innovations

Proactive Monitoring in Heavy Industry: The Role of AI and Human Curiosity

Navigating the AI Landscape: Challenges and Innovations in Retail

The Anti-CRM CRM: How Spiro Uses AI to Transform Sales

Unlocking AI Potential with AMD's ROCm Stack

Applying AI To The Construction Industry At Buildots

The Future of AI Systems: Open Models and Infrastructure Challenges

The Rise of Agentic AI: Transforming Business Operations

Protecting AI Systems: Understanding Vulnerabilities and Attack Surfaces

Understanding The Operational And Organizational Challenges Of Agentic AI

The Power of Community in AI Development with Oumi

Arch Gateway: Add AI To Your Apps Without Custom Development

The Role Of Synthetic Data In Building Better AI Applications

Optimize Your AI Applications Automatically With The TensorZero LLM Gateway

Harnessing The Engine Of AI

The Complex World of Generative AI Governance

Building Semantic Memory for AI With Cognee

The Impact of Generative AI on Software Development

ML Infrastructure Without The Ops: Simplifying The ML Developer Experience With Runhouse

Building AI Systems on Postgres: An Inside Look at pgai Vectorizer

Running Generative AI Models In Production

Enhancing AI Retrieval with Knowledge Graphs: A Deep Dive into GraphRAG

Harnessing Generative AI for Effective Digital Advertising Campaigns

Building Scalable ML Systems on Kubernetes

Expert Insights On Retrieval Augmented Generation And How To Build It

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Build Your Second Brain One Piece At A Time

Strategies For Building A Product Using LLMs At DataChat

Improve The Success Rate Of Your Machine Learning Projects With bizML

Using Generative AI To Accelerate Feature Engineering At FeatureByte

Authentication Required