PODCAST · technology

YAAP (Yet Another AI Podcast)

by AI21

YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned. Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.

Subscribe · 0 Bookmark

22

Chunking Isn’t Dead. One Size Doesn’t Fit All

Chunking is still one of the least-discussed but most decisive parts of RAG. In this episode, we break down why no single chunk size works for all questions, how different queries benefit from different window sizes, and why fixed-window indexing quietly limits retrieval performance. We walk through a multi-window chunking approach, show how rank fusion ties it together, and explain why better agents can’t fix retrieval when the data is indexed the wrong way.

Mar 18, 2026

10m
21

Stop Shipping Agents With Chat UIs

Chat was a great prototype. It’s a terrible product. In this episode, Yuval sits down with CopilotKit co-founder Atai to unpack why most agentic apps stall at “chat + vibes”, and why the real bottleneck in production AI isn’t models or reasoning. It’s UI. They break down what actually changed in the last year, why agents fundamentally break the request-response paradigm, and how a new generation of protocols is emerging to connect agents to real users. The conversation covers: AG-UI, Model Context Protocol (and MCP Apps) and Agent-to-Agent (A2A) Protocols The messy (but inevitable) transition from text-only chat to component-rich, voice-enabled, agentic applications. If you’ve built an agent that works but users still bounce, this episode explains why, and what the new “glue layer” of AI UIs is starting to look like.

Feb 24, 2026

35m
20

MCP Was Built for Tools, Not for Agents That Write

MCP standardized tool calling for agents but breaks down once agents start mutating state. In this episode, Yuval sits with Eran Gat from AI21 to dig into what happens when writing agents run in parallel, why shared environments fall apart, and how workspace isolation becomes a missing execution layer. Using real coding workloads and benchmarks, we walk through the architectural trade-offs behind making concurrent agents actually work.

Feb 9, 2026

22m
19

Why AI Leaderboards Miss the Point

Leaderboards reward “best average score.” Real users reward “answer fast, don’t hallucinate, don’t bankrupt me.” In this special deep dive episode, AI21’s CTO Barak Lenz walks through four gaps between what models can do and what real AI systems deliver: validation, contextualization (pick the right approach per input), latency (parallelize and stop early), and decomposition (making those choices continuously inside long workflows). Less “best model.” More “best execution.”

Jan 15, 2026

56m
18

The Agent Swarm Fallacy

Running multiple agents can improve quality. Doing it right is the hard part. This time we look at the Agent Swarm Fallacy: the idea that throwing more agents at a problem automatically makes systems better. Yuval sits with Or Dagan, AI21 CPO, to explore why this breaks in practice, what happens when agents act instead of just think, and how test-time compute, structured execution, and smart decision points offer a solution.

Jan 13, 2026

30m
17

This Deep Research Agent Ignored the Benchmark and Still Won

Tavily built a Deep Research Agent with production in mind. Something they could actually scale. So they did the unsexy work. They went through millions of agent logs, found where tokens were being wasted, and optimized each section of the system. The result surprised them: they cut token consumption by more than half (!), then tested quality and discovered they topped the DeepResearch Bench without even trying. In this YAAP episode, Yuval sits down with Dean from Tavily to break down how they built it, what they did differently from the usual top approaches, and which design choices made better results possible with far fewer tokens. What you’ll learn: How to reduce token burn without tanking quality Why reading millions of logs beats chasing the flashiest tech The design choices that pushed quality up while tokens dropped hard

Jan 1, 2026

29m
16

Don’t Learn Distributed Systems. Just import ray

You wanted to build an agent. You ended up debugging GPUs, scaling workers, and chasing OOMs. In this episode of YAAP, Yuval sits down with Linda from Anyscale to unpack why Ray exists and how it helps AI teams scale without turning every developer into a distributed systems expert. We trace Ray’s roots in reinforcement learning research, then zoom out to how it’s used today across the AI pipeline: data processing, training, inference, and agents. Along the way, we cover why libraries like vLLM build on Ray, when Ray vs. SaaS makes sense, and why unstructured and multimodal data push traditional big-data tools to their limits.

Dec 29, 2025

30m
15

GenAI Meets Wall Street: Why Every Bank Thinks It’s a Snowflake

Banks love GenAI. They just don’t trust it. Yet. In this episode of YAAP, Yuval talks with Renee Lau from AWS, a financial services industry specialist who works hands on with banks, insurers, and hedge funds as they try to move generative AI from pilots into production. Renee shares what she sees across the market, what actually works, and where teams get stuck. They explore the two sides of GenAI adoption in finance. Cost cutting back office automation and revenue driven use cases like hedge fund research. Along the way, they dig into compliance, pricing, human in the loop workflows, and the crawl walk run path to deployment. You will also hear why every bank believes it is a special snowflake, why that instinct is understandable, and how builders can still create solutions that scale across financial services.

Dec 23, 2025

35m
14

Everyone’s got the same model. Now what?

Everyone’s building on the same foundation models. So how do you stand out? For Imagen AI, the answer isn’t bigger models, it’s smarter loops. CEO Yotam Gil joins Yuval to unpack how personalization, workflow integration, and continuous feedback turned Imagen’s photo-editing engine into a true moat. But that’s only half the story. The other half is speed: how a two-person Commando Squad at Imagen uses “vibe-coding in production” to prototype new ideas in one or two sprints, test them in the wild, and kill what doesn’t stick — without hurting the core product. It’s a conversation about differentiation when models are commodities, and about building a culture that moves as fast as the tech it’s built on.

Dec 15, 2025

31m
13

The House That Builds Builders – The Origin Story of AGI House

Three years ago, it was just a house full of friends geeking out about AI. Today, it’s where researchers, founders, and engineers collide — and where hackathon demos turn into real startups. In this episode, Yuval sits down with Henry Yin, Co-founder & CTO of AGI House, to unpack how a pandemic project became the Bay Area’s builder epicenter. From fine-tuning meetups to venture funding, they trace the journey of turning one house into the heart of a movement.

Nov 11, 2025

11m
12

Scraping Without Getting Sued (Or Falling Asleep)

Everyone (and we do mean EVERYONE) needs data, and the web is the largest database humanity has ever built. But tapping into it at scale requires more than technical skills. If your product touches web data, scraping isn't just a backend task, it can be risky and have real consequences. In this episode, Yuval sits down with Rony Shalit, Chief Compliance and Ethics Officer at Bright Data, to talk about what can go wrong when you treat data collection as “just an implementation detail”. From lawsuits with Meta and X to wild edge cases and vendor breakdowns, they dive into what it takes to collect data responsibly and stay out of trouble.

Oct 28, 2025

48m
11

The Judge Model Diaries: Judging the Judges

Your LLM gave a great answer. But who decides what “great” means? In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story. Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.

Aug 26, 2025

30m
10

RLVR Lets Models Fail Their Way to the Top

Think you know fine-tuning? If your answer is RLHF, you don’t. In this episode, Itay, who leads the Alignment group at AI21, gives a no-fluff crash course on RLVR (Reinforcement Learning with Verifiable Rewards), the method powering today’s smartest coding and reasoning models. He explains why RLVR beats RLHF at its own game, how “hard to solve, easy to verify” tasks unlock exploration without chaos, and the emergent behaviors you only get when models are allowed to screw up. If you want to actually understand RLVR (and use it), start here. Key topics: How RLVR outsmarts RLHF in real-world training The “verified rewards” trick that kills reward hacking Emergent skills you don’t get with hand-holding: self-verification, backtracking, multi-path reasoning Why coding models took a giant leap forward Practical steps to train (and actually benefit from) RLVR models

Aug 12, 2025

49m
9

RAG Is Not Solved – Your Evaluation Just Sucks

RAG Is Not Solved – Your Evaluation Just Sucks Your RAG pipeline is passing benchmarks, but failing reality. In this episode, Yuval sits down with Niv from AI21 to expose why most RAG evaluation is fundamentally flawed. From overhyped retrieval scores to chunking strategies that collapse under real-world complexity, they break down why your system isn’t as good as you think — and how structured RAG solves problems that traditional pipelines simply can't. Bonus: what do Seinfeld trivia, World Cup stats, and your enterprise SharePoint have in common? (hint: your RAG pipeline chokes on all of them). Key Topics: Why most RAG benchmarks reward the wrong thing (and hide real failures) The chunking trap: how bad segmentation sabotages good retrieval When LLMs ace the answer—but your pipeline still fails Structured RAG: pipeline that solves RAG problem over aggregative data (such as financial reports) Evaluation tips, tricks, and traps for AI builders

Jul 29, 2025

43m
8

The Call Is Coming From Inside the Agent (And It Has Your Credentials)

The Call Is Coming From Inside the Agent (And It Has Your Credentials) You’ve shipped your first agent. It works. It’s useful. It might also be a security liability you don’t even know about. In this episode, Yuval talks to Zenity CTO Michael Bargury about how easy it is to hijack popular agent systems like Copilot and Cursor, what “zero-click” attacks look like in the agent era, and how to monitor, constrain, and secure your AI Agent in production. From sneaky prompt injections to memory-based persistence and infected multi-agent workflows, this is the “oh no” moment every builder needs. Key Topics: Why “ignore previous instructions” still works better than it should How one agent goes rogue… and infects the others Real-world attacks: social media triggers, CRM leaks, and logic bombs Observability 101 for AI: logs, reasoning traces, and root cause sanity The new rule: build like it will go rogue—because one day it will

Jul 15, 2025

49m
7

Building Enterprise RAG: Lessons from 2+ Years of Production Deployments

Building production AI systems is hard — especially when you're pioneering entirely new categories. In this episode, Yuval speaks with Guy Becker, Group Product Manager at AI21, to trace the evolution from task-specific models to Agent planning and orchestration systems. Guy shares hard-won lessons from building some of the first RAG-as-a-service offerings when there were literally zero handbooks to follow. Key Topics: Task-specific models vs. general LLMs: Why focused, smaller models with pre and post-processing beat general purpose LLMs for business use cases. Building RAG before it was cool: Creating one of the first RAG-as-a-service platforms in early 2023 without any established patterns. The one-size-fits-all problem: Why chunking strategies, embedding models, and retrieval parameters need customization per use case. From SaaS to on-prem: Scaling deployment models for enterprise customers with sensitive data. When RAG breaks down: Multi-hop queries, metadata filtering, and why semantic search isn't always enough. Multi-agent orchestration: How AI21 Maestro uses automated planning to break complex queries into parallelizable subtasks. Production lessons: Evaluation strategies, quality guarantees, and building explainable AI systems for enterprise..

Jul 1, 2025

37m
6

Trailer

Jun 19, 2025

0m
5

You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing

Everyone's talking about AI agents, but most of what we call "agents" are just workflows in disguise. Real autonomous agents require planning. And that, changes everything. In this episode, Yuval speaks with AI21's Algo Tech Lead, Nitzan Cohen about why the popular React framework isn't enough and how planning architecture unlocks true agent capabilities. Key Topics: 1. The difference between workflows/chains and real autonomous agents 2. Why React agents fail at complex tasks, parallel execution, and user transparency 3. Free text vs. code-based planning approaches and their trade-offs 4. How planning enables multi-agent systems and model delegation 5. Training planners with reinforcement learning and replanning mechanisms 6. Evaluation challenges: Gaia benchmark, Agent Bench, and building custom datasets 7. Practical advice: When to upgrade from React and which frameworks to use From competitive analysis that runs in parallel to breaking down complex coding tasks, discover how planning transforms AI agents from simple tool-calling loops into sophisticated problem-solving systems.

Jun 17, 2025

33m
4

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail

Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development. Key Topics: Why current benchmarks are broken: From MMLU's limitations to RAG leaderboards that don't reflect real-world performance The tool use illusion: Why 95% accuracy on tool calling benchmarks doesn't mean your agent can actually plan LLM-as-a-judge problems: How evaluation bottlenecks are capping progress compared to verifiable domains like coding Framework: friend or foe? When to ditch LangChain, LlamaIndex, and why minimal implementations often work better The real agent stack: MCP, sandbox environments, and the four essential components you actually need Beyond the hype cycle: From embeddings that can't distinguish positive from negative numbers to what comes after agents From FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it really takes to build agents that solve real problems — not just impressive demos. Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.

Jun 10, 2025

39m
3

Tool Calling 2.0: How MCP Is Standardizing AI Connections

MCP (Model Context Protocol) is changing how developers connect AI applications to external tools – but what exactly is it, and why should you care? In this episode, Yuval speaks with Etan Grundstein, Technical Product Manager (and formerly Director of Engineering) at AI21, to break down the protocol that’s standardizing AI integrations, moving beyond basic weather APIs and calculators to real-world productivity workflows. Key Topics: 1) What MCP actually is and how it differs from traditional tool calling 2) Real-world examples: Connecting AI to Jira, Notion, Git, and even Blender 3) The evolution from local MCP servers to cloud integrations 4) Authentication challenges and how they’re being addressed 5) Why developers are building MCP servers to build other MCP servers 6) Looking ahead: Agent-to-Agent protocols and what comes next

May 29, 2025

29m

View all 22 episodes →

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

HOSTED BY

AI21

Frequently Asked Questions

How many episodes does YAAP (Yet Another AI Podcast) have?

YAAP (Yet Another AI Podcast) currently has 20 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is YAAP (Yet Another AI Podcast) about?

How often does YAAP (Yet Another AI Podcast) release new episodes?

YAAP (Yet Another AI Podcast) has 20 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to YAAP (Yet Another AI Podcast)?

You can listen to YAAP (Yet Another AI Podcast) on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts YAAP (Yet Another AI Podcast)?

YAAP (Yet Another AI Podcast) is created and hosted by AI21.

URL copied to clipboard!

Chunking Isn’t Dead. One Size Doesn’t Fit All

Stop Shipping Agents With Chat UIs

MCP Was Built for Tools, Not for Agents That Write

Why AI Leaderboards Miss the Point

The Agent Swarm Fallacy

This Deep Research Agent Ignored the Benchmark and Still Won

Don’t Learn Distributed Systems. Just import ray

GenAI Meets Wall Street: Why Every Bank Thinks It’s a Snowflake

Everyone’s got the same model. Now what?

The House That Builds Builders – The Origin Story of AGI House

Scraping Without Getting Sued (Or Falling Asleep)

The Judge Model Diaries: Judging the Judges

RLVR Lets Models Fail Their Way to the Top

RAG Is Not Solved – Your Evaluation Just Sucks

The Call Is Coming From Inside the Agent (And It Has Your Credentials)

Building Enterprise RAG: Lessons from 2+ Years of Production Deployments

Trailer

You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail

Tool Calling 2.0: How MCP Is Standardizing AI Connections

Authentication Required