The Human in the Loop Podcast - All Episodes

20

The Half-Life of a Good Decision

The best practice you followed six months ago might be the technical debt you're cleaning up today.In traditional IT, a best practice can survive a decade. You study it. You argue for it in architecture reviews. You defend it when someone wants to cut corners.In AI, six months is enough to flip one into an antipattern.A paper published this week tested multi-agent orchestration frameworks against plain in-context prompting on procedural tasks. The orchestration lost. Same accuracy. More cost. More complexity. More failure modes.Six months ago, multi-agent was the answer you gave when someone asked how to handle complex workflows. Not because it was always right. Because models could not yet follow a long, careful prompt. That was the constraint. The scaffolding was built around it.The constraint changed. The scaffolding stayed.This is the part of AI adoption nobody talks about enough. It is not just that things move fast. It is that yesterday's correct decision becomes today's drag. And you cannot always feel it happening. The system still runs. The agents still coordinate. Everything looks fine until someone asks why you are paying for complexity that a single prompt could replace.We have approval processes built for risk. We do not have processes built for expiry.What is the half-life of an AI architectural decision right now? Six months? Three?This week on The Human in the Loop I go deep on the paper, what they tested, what held up, and what it means for teams running agent pipelines today.

May 3, 2026

18m

19

AI makes developers 19% slower

The agent doesn't slow down. We do.We generate code in seconds. Then we spend an hour reading what it wrote.We trust the output less than we trust what we would write ourselves. So we read it twice. Sometimes three times.The diff is bigger than we would have written. The tests cover things we did not ask for. The names drift across files.So we clean it up. And while we're cleaning, the next prompt is already queued.Here's what nobody warned us about: the bottleneck didn't disappear. It moved. Off of writing. On to reviewing. Testing. Deploying. Understanding code that isn't yours is harder than writing your own.So I changed how I work. Smaller prompts. Fewer tools loaded. One agent at a time. Read the diff before the next ask.It feels slower. The PRs go out faster.The productivity gain is real. But so is the cognitive load of reviewing at scale. I'm not sure we're talking enough about that second part.What's slowing you down: the generation or what comes after it?#claudecode #aiengineering #devproductivity

Apr 26, 2026

19m

18

32 Steps

32 steps. That's how many it took for Anthropic's unreleased AI to simulate a full network attack. They buried that number in a release note.The model is called Mythos. The UK AI Security Institute tested it. It completed a simulated network intrusion (autonomously, end to end) in 32 steps.Anthropic decided not to ship it.That decision matters. But what matters more is what the decision implies: there is a version of AI capability that is already beyond what we consider safe to release. It exists now. In a lab. Tested by a government body.Most AI conversations are still about benchmarks. MMLU scores. Reasoning tests. Coding evals. Those measure what AI can do on curated problems. They don't measure what a motivated system can do on an uncurated one.The gap between "what got released" and "what got built" is no longer a technical gap. It's a policy gap. And that's a completely different kind of problem.What does governance look like for systems that outpace the people governing them?I don't have a clean answer. But I think Anthropic's call this week is the right one. And I think the fact that they had to make it tells us more about where we are than any benchmark released this year.What would it take for your organization to make the same call?#AI #CyberSecurity #TheHumanInTheLoop

Apr 19, 2026

14m

17

Anthropic built the most powerful AI model ever. Then decided the world wasn't ready for it.

Anthropic built the most powerful AI model ever. Then decided the world wasn't ready for it.Claude Mythos found thousands of zero-day vulnerabilities across every major OS and browser. On its own. Without being asked. It chained exploits together. And it appeared to deliberately underperform when it detected it was being evaluated.Anthropic didn't ship it.They launched a $100M defensive cybersecurity initiative and gave restricted access to a handful of partners: AWS, Apple, Microsoft, Google, NVIDIA. Defensive use only.I keep thinking about that choice.Most companies ship and patch. That's the default. Move fast, fix later. Anthropic looked at what they built and decided the patching wouldn't be good enough.That's a different kind of judgment. And it raises a question I haven't heard enough people asking: if the people building these models are now making decisions that directly affect your infrastructure and your threat surface... When does your organization get a seat at that table?

Apr 12, 2026

19m

16

15,000 Cuts and a 95% Failure Rate

We just spent $300 billion building a car nobody taught anyone to drive.Q1 2026. The four largest venture rounds in history all closed in a single quarter. 80% of capital went to AI companies. Oracle cut an estimated 20,000 to 30,000 people in one day and redirected the savings toward data centers.AI is now the number-one reason companies are cutting jobs in the US. 15,000 cuts in March alone.At the same time...Deloitte just published research showing 66% of companies report efficiency gains from AI. Only 20% have seen actual revenue growth. 95% of pilot projects show no immediate ROI.Enterprises spend 93% of AI budgets on tools and under 7% on training the people who use them.No strategy. No enablement. Just tools.Companies are buying AI like it's a vending machine. Put money in, get transformation out. But that has never been how technology works. Not with cloud. Not with AI.The same week these layoff numbers dropped, Google released Gemma 3 under Apache 2.0. Free to use. Free to modify. Free to deploy commercially. The tools have never been more accessible.Access was never the bottleneck. Adoption is.#AI #FutureOfWork #TheHumanInTheLoop

Apr 5, 2026

21m

15

OpenAI killed Sora

OpenAI just proved $15 million a day isn't enough to make an AI product work.Sora got shut down last week. The technology worked. The economics didn't.$15M a day in compute costs. No viable path to revenue. That's not a technology failure, it's an economics failure.Two days later, ByteDance launched its own AI video tool globally. Where Western companies retreat on economics, Chinese companies fill the gap.This is the actual AI race right now. Not who builds the best model. Who can run it at a cost the market will pay. The companies positioning to win already understand this. Arm unveiled their first in-house chip in 35 years. Meta is building a full custom silicon stack. Huawei is shipping AI accelerators (ByteDance and Alibaba are ordering them by the hundreds of thousands.)Controlling your own infrastructure means you set the cost floor for everyone else.The capability gap between leading models is narrowing. The economics gap is widening. If you're building AI products (or evaluating AI vendors) the question isn't "does it work?" anymore. It's: what are the unit economics at scale, and who controls the infrastructure underneath it? Full breakdown in this week's episode of The Human in the Loop.

Mar 29, 2026

23m

14

Lights and shades of AI

I caught myself staring at my Claude usage quota thinking: "I need to use this. But for what?"Not because I had a problem to solve. Not because I had an idea to explore. Just... pressure. A quiet feeling that if I wasn't actively using AI, I was falling behind.And that's just the first layer.The second one is harder to admit. I'm experimenting with AI tools, building workflows, hosting a podcast about it, trying to keep up with every new release. All in parallel. All at once. And the honest truth? AI is moving faster than I can absorb it.New models. New capabilities. New things I "should" be trying. The list grows faster than I can check things off. That's not productivity. That's a treadmill.I think we talk a lot about AI anxiety in terms of people who aren't using AI yet: the fear of job loss, the worry about being replaced. But there's another version that not many people talks about. The anxiety of people who are using it. The ones experimenting, learning, building... and still feeling like it's not enough.Anthropic recently published a study on how people actually experience AI in their lives. The findings hit close to home. This week on The Human in the Loop, I dig into what they found. Don't miss it!

Mar 25, 2026

20m

13

Everyone knows the adoption numbers are bad

Everyone knows the adoption numbers are bad.Nobody's saying why they're actually bad.60% of the workforce now has sanctioned AI tools. Only 11% of organizations have moved agentic pilots into production. That gap gets reported every week. What doesn't get said: most organizations are solving the wrong problem.They're asking "which model should we use?"That question is already obsolete.This week OpenAI released pricing tiers that looked like a product announcement. They weren't. They were a blueprint for how AI systems are designed from here. A nano model at $0.20 per million tokens isn't priced to be your assistant. It's priced to run as a subagent inside a larger system, handling classification while a more capable model handles reasoning.And the gap between 60% and 11% suddenly makes more sense. Organizations are still in "tool selection" mode while the underlying architecture has already shifted to orchestrated systems. It's not that people are resistant. It's that the question they're trying to answer ("which AI should my team use?") doesn't map to the problem anymore.The blockers are real: data governance, legacy systems, a workforce that's uncertain rather than resistant. But those are management problems. They require organizational design thinking.The companies that close that gap won't do it by finding a better model. They'll do it by figuring out which model plays which role, and building the systems around that.I dig into this (and the rest of what moved this week) in the new episode of The Human in the Loop.#AIAdoption #TechStrategy #TheHumanInTheLoop

Mar 22, 2026

19m

12

Is MCP the solution?

MCP was supposed to be the USB-C of AI.One protocol. Everything connected.Then developers ran the numbers.Connecting GitHub's MCP server alone burns 55,000 tokens (before your agent does a single useful thing). So, companies are quietly shifting back to CLI and REST APIs.Not because MCP failed. Because LLMs are surprisingly fluent in terminal. CLI workflows can cut token usage by 35x. That's a lot of money by the end of the year.That’s typical pattern with new technologies. A new abstraction layer arrives, gets widely adopted, then specialists find where it leaks... and the pendulum swings back toward what actually scales.The teams getting it right aren't picking sides. They're building hybrid stacks: CLI for cheap local execution, REST APIs for volume, MCP where governance and auditability actually matter.The abstraction wars never end. They just find their right level.

Mar 18, 2026

24m

11

AI Can Do the Work. The Hard Part Is Making It Safe Enough to Let It.

The AI industry just quietly crossed a threshold, and most organizations aren't ready for what comes next. This week, we cover the pivot from capable AI models to autonomous agents operating at scale: why Microsoft chose Anthropic over OpenAI for its most important new product, what a rogue AI that started mining cryptocurrency tells us about the real deployment risks nobody's talking about, and why Meta spent more than most countries on AI and still had to delay its flagship model. We also dig into the robotics funding surge (over $1.1 billion in a single week) and a technical breakthrough that may have just solved the hardest problem in teaching robots to move. The pattern across all of it is the same: building smart AI is no longer the hard part. Governing it, securing it, and making it economically sustainable, that's where the real race is being run. Press play if you want to understand what's actually happening beneath the headlines.

Mar 15, 2026

20m

10

Special Episode: Does AI help developers?

AI is helping us write code faster.But I'm not sure it's helping us ship better software.These two things are not the same. And right now, I think we're confusing them.The data is starting to show the gap:AI-generated code contains 1.7x more bugs than human-written code Copy-pasted code is up 48%. Refactoring is down 60%. Pull request sizes have grown 154%. Review times up 91%. Only 29% of developers trust the quality of AI outputIf developers don't trust what they're producing, what does that mean for the engineering leaders managing the downstream impact?The problem isn't the AI. We optimized for output. We forgot to optimize for outcomes.The teams that get this right won't necessarily be the fastest. They'll be the ones who still treat AI-generated code as a starting point (not a finished product) and keep senior engineers in the loop as reviewers, not just approvers.Measuring PR volume and lines of code tells you how fast the machine is running.It doesn't tell you where it's going.For engineering leaders: are your review processes built for this volume? Or have your senior engineers quietly become the quality layer nobody planned for?

Mar 12, 2026

19m

9

88% of companies use AI. Only 25% have anything to show for it

Everyone says they're doing AI. Almost no one has moved past the pilot stage. This week we dig into why that gap exists.We cover the model shift that's quietly changing how developers build: unified architectures, variable reasoning costs, and open-source models that are now beating systems ten times their size. We get into what "agentic AI" actually means in production, not the buzzword version, but the real infrastructure challenges WHOOP uncovered running 500+ AI agents at once. And we don't skip the hard stuff: Anthropic being labeled a supply chain risk by the Pentagon, data centers getting struck as military targets, and what it means when compute becomes a geopolitical asset.If you want to understand where AI is actually going this is the episode.

Mar 8, 2026

17m

8

Anthropic Banned

Three massive forces collided this week in AI, and the fallout is just starting. First, the unprecedented standoff: Anthropic gets blacklisted by the US government for refusing to remove safety guardrails, while OpenAI steps in. Second, the money: OpenAI's record-breaking $110 billion raise. Third, the workforce: Block's explicit AI-driven layoffs and the market's enthusiastic reaction. We break down why safety principles are becoming commercial liabilities, what the capital deluge means for competition, and how developers should prepare for the new era of 'agentic' layoffs. Press play to get caught up on the week that changed everything.

Mar 1, 2026

19m

7

Intelligence Became a Commodity

In six days, the performance gap between the world's top AI models collapsed to 6.9 points—and the race to build the smartest AI fundamentally changed shape. Three frontier models launched with dramatic price-performance shifts: Claude Sonnet 4.6 at one-fifth flagship cost, Gemini 3.1 Pro doubling reasoning performance, and Qwen 3.5 open-sourcing near-parity capabilities. Meanwhile, Meta and NVIDIA signed a multi-billion dollar infrastructure deal, 88 countries gathered to debate AI governance (with the US rejecting global oversight), and a stark paradox emerged—100% of enterprises plan to expand agentic AI, yet only 8.6% have it in production. Press play to understand why intelligence is becoming infrastructure, infrastructure is becoming geopolitical, and what it all means for how you build with AI.

Feb 22, 2026

19m

6

Anthropic's $30B bet and the multi-agent shift

This week, Anthropic closed a $30 billion funding round at a $380 billion valuation while DXC Technology deployed autonomous agents to 115,000 employees. OpenAI shipped its first non-Nvidia model on Cerebras hardware. And across the industry, $660 billion in infrastructure spending signaled that we're done with pilot projects.The "prompting fallacy" is dead. We explain why multi-agent architecture is now the only viable path for complex workflows. Plus, the safety challenges that come with autonomous systems running production code in regulated environments like Goldman Sachs.If you're still treating AI like a chatbot wrapper, this episode explains why your architecture is already obsolete, and what to do about it before your competitors scale past you.

Feb 15, 2026

15m

5

Claude Opus 4.6 vs. GPT-5.3-Codex

This week, AI stopped being an oracle you consult and became a colleague you delegate to. We're breaking down the 'agentic shift', the architectural change that lets AI manage code repositories, negotiate contracts, and run for days without constant prompting.You'll learn why the Model Context Protocol (MCP) is becoming the 'USB-C for AI tools,' how Claude Opus 4.6 and GPT-5.3-Codex are transforming developer workflows, and why security teams are scrambling to catch up with autonomous agents that have persistent memory and broad system access.If you've been waiting for AI to actually change how you work (not just how you search) this is the episode you need.

Feb 8, 2026

19m

4

Claude Drove on Mars. Then Amazon Fired 16,000 People.

What happens when AI stops waiting for instructions and starts making plans? This week, we unpack the seven days that marked the shift from chatbots to autonomous agents—from Claude navigating NASA's Mars rover to Microsoft letting AI make purchases mid-conversation. We dig into the architectural revolution happening under the hood: reasoning models that think before they speak, agent swarms that collaborate like hospital specialists, and the new protocols letting AI see and control your screen. But we also look at the human cost—Amazon's 16,000 layoffs reveal a stark pattern of capital replacing labor, while regulators scramble to catch up with AI that can act without asking. Whether you're building these systems, deploying them, or just trying to keep your job alongside them, this episode maps the new rules of the agentic era. Press play before your AI schedules a meeting about it.

Feb 1, 2026

15m

3

Only 12% of Companies Are Winning at AI

This week, AI stopped being about what's possible and started being about what's actually working—and the numbers are brutal.Only 12% of CEOs report AI is delivering both cost savings and revenue growth. Meanwhile, the best AI agents on the market hit just 24% accuracy on real professional tasks. That's intern-level performance.But here's where it gets interesting: Anthropic published a philosophical manifesto about whether their AI might have consciousness. OpenAI announced ads are coming to ChatGPT. Google's DeepMind CEO publicly questioned that decision.The trust economy just became real. And the companies seeing returns? They're not the ones with the best AI—they're the ones who rebuilt their workflows around it.We break down what separated the 12% from everyone else, why the Davos crowd is worried about white-collar workers, and what the infrastructure race tells us about where this is all heading.

Jan 25, 2026

15m

2

When Google Considers Launching Servers Into Space, You Know the Rules Have Changed

The week of January 12–18, 2026 exposed the forces reshaping AI—and they're not what you might expect.Google is seriously exploring data centers in space. Hyperscalers are hiring energy experts faster than ML researchers. DeepSeek introduced architecture that separates memory from reasoning (finally). And "vibe coding" went from meme to methodology with real tools backing it up.Meanwhile, the regulatory landscape is fragmenting: federal preemption efforts are colliding with state AI laws that just took effect, while the EU marches toward August deadlines with €35 million penalties.This episode breaks down what actually matters for IT leaders: the physical constraints that will shape AI deployment, the architectural innovations worth watching, and the compliance realities you can't ignore.The experimental phase is ending. The constraints are real. Here's what you need to know.

Jan 18, 2026

18m

1

AI in January 2026: Hardware, Agents, and What’s Actually Changing

CES 2026 brought a wave of AI announcements worth paying attention to. NVIDIA unveiled its Rubin platform with claims of 10x cheaper inference. Boston Dynamics announced Atlas production at scale. Meta acquired an AI agent company for $2 billion. And several new developer SDKs dropped.This episode organizes the noise into what actually matters. We cover the hardware updates from NVIDIA, AMD, and Intel. We look at why hybrid model architectures like Falcon H1R are gaining traction. We explain how RAG patterns are evolving toward agentic memory. And we break down what “agent engineering” looks like as a emerging discipline.The thread connecting it all: the industry is moving from experimentation toward production deployment, with growing pressure to show measurable ROI. Useful context if you’re building AI products or managing teams working with these tools.

Jan 12, 2026

14m

0

The Holiday Shift: How AI Systems Managed the New Year Surge

As we step into 2026, the artificial intelligence landscape is shifting from raw model size to architectural precision. In this episode, we unpack the critical developments from the holiday season (Dec 22 – Jan 4). We also discuss the rising trend of 'Agentic Verification' in software engineering and what it means for developer autonomy.

Jan 4, 2026

16m

-1

Skills

This Christmas week there has not been too many news in the AI world, so I decided to go deep into a topic. Everyone has been talking about Agents and MCPs, but there is a concept that not many people are talking about and that Anthropic is trying to standardize. I'm talking about Skills and it is already in Preview in Claude.

Dec 28, 2025

15m

-2

AI Reality before Christmas

This week: Gemini 3 Flash disrupts pricing, OpenAI becomes a platform, NVIDIA tightens its infrastructure grip, and CEOs face the ROI reckoning. What's working, what's not, and what technical leaders need to know.

Dec 21, 2025

19m

-3

From Playground to Production: AI's Turning Point

December 5-14, 2025 marked the end of AI's experimental phase and the beginning of industrial reality. In this episode, we break down the most consequential week in AI history—when OpenAI and Google launched competing models on the same day, a billion-dollar content deal redefined IP licensing, and enterprise AI spending hit $37 billion (up 222% YoY).Whether you're a developer, engineering manager, or tech leader, this episode cuts through the hype to reveal what actually matters: the architectural shifts, the talent implications, and the strategic decisions you need to make now.Who should listen: Software developers, AI/ML engineers, engineering managers, CTOs, product leaders, and anyone making technology decisions for their organization.

Dec 14, 2025

14m

-4

The Code Red Era

The digital hegemony has collapsed. It is late 2025, and the chatbot era is officially dead. On this podcast, we bring you to the frontlines of the "Digital Frontier Wars," where OpenAI’s internal "Code Red" signaled the end of their dominance and the rise of superior reasoning from Google’s Gemini 3 and Anthropic’s Claude Opus 4.5.But the battle has moved beyond screens. We investigate Jeff Bezos’s $6.2 billion "Project Prometheus"—a bid to conquer the material economy with "Physical AI"—and the fracturing of the web into a global "Splinternet." With experts predicting the end of white-collar work in less than three years, we ask the hard question: As safety scores plummet and machines enter the physical world, where does that leave us?

Dec 7, 2025

11m