PODCAST

Debug Log

Software engineering war stories, architecture decisions, and lessons learned.

Subscribe · 0 Bookmark

17

Debug Log: The Million-Goroutine Memory Leak and the Case for "Boring" Auth

This episode explores a critical Kubernetes authentication gateway's failure, caused by an accumulation of a million dormant goroutines. It details how client-side context cancellations were not properly propagated to upstream proxying goroutines, leading to these lightweight concurrency units holding onto resources indefinitely. Listeners will learn about the crucial importance of meticulous context propagation in Go's concurrency model, especially in I/O-bound networked services, to prevent similar resource leaks and system instability.

May 8, 2026

11m
16

Chasing the Cart: Why Pinterest Ripped Out Its Sequential Ad Architecture

This episode explores the challenges of traditional multi-stage ad serving architectures, where optimizing for intermediate metrics like clicks can inadvertently sabotage ultimate conversion goals by prematurely filtering out valuable ads. Listeners will learn how integrating sophisticated conversion prediction intelligence much earlier in the pipeline, through a dedicated "Conversion Candidate Generation" component, can overcome these limitations and lead to more effective ad delivery.

May 8, 2026

10m
15

The Blast Radius of Agentic AI: Why "Five Nines" is a Relic

This episode explores why the traditional "five nines" reliability metric is fundamentally unsuitable for agentic AI systems. It explains that unlike traditional systems, agentic AI can be "up" but still cause catastrophic failures through incorrect autonomous actions, leading to a significantly wider "blast radius" of damage. Listeners will learn about the unique failure modes of these self-directed systems and the critical need to shift focus from mere availability to ensuring correctness and integrity.

May 1, 2026

11m
14

Phantom in the Page Cache: Unpacking the 10-Line "Copy Fail" Exploit

This episode discusses a 9-year-old, 10-line "Copy Fail" exploit found in the Linux kernel's page cache, highlighting the paradox of such a critical yet subtle vulnerability evading detection for so long. It explores the nature of this "phantom" bug, explaining how its "surgical precision" and exploitation of concurrency in the page cache make it incredibly difficult to detect, even in highly scrutinized software. Listeners will learn about the profound implications of small flaws in critical system components and the challenges of securing complex, concurrent operating systems.

May 1, 2026

12m
13

Automating the Autopsy: The Promise and Peril of AI-Generated Postmortems

This episode explores the intriguing concept of using AI to write incident postmortems, highlighting its potential for speed, consistency, and automating data synthesis from vast sources. However, it also delves into the significant perils, such as the impact of poor data quality, the risk of AI hallucinations, and AI's inability to grasp the nuanced human "why" behind incidents. Listeners will learn about the dichotomy between AI's data processing power and the essential human element in understanding complex system failures.

May 1, 2026

13m
12

The Harness and the Lobotomy: Unpacking Anthropic’s 47-Day Degradation

This episode explores a 47-day incident where Anthropic's Claude Code appeared to degrade, revealing that the core AI model was intact but its 'harness'—the surrounding infrastructure and system prompts—failed. Listeners will learn how critical this 'harness' is for an AI product's effective performance, and how seemingly minor changes, like lowering default reasoning effort, can lead to significant user frustration and a breakdown of trust between a company and its users.

Apr 25, 2026

17m
11

Scaling for Ghosts: 7 Microservices, 47 Users, and the Trap of Resume-Driven Development

This episode explores the phenomenon of "Resume-Driven Development," where an engineer at a pre-seed startup built an enterprise-grade distributed system designed for 100,000 users, despite only having 47. It highlights how engineers might prioritize resume-boosting complex infrastructure over a startup's actual needs, leading to significant financial and human capital costs. Listeners will learn about the dangers of over-engineering and the critical misalignment of incentives in early-stage tech development.

Apr 25, 2026

14m
10

The 3,000 Incident Postmortem: Why Caches Are Actually the Enemy

This episode explores Marc Brooker's controversial claim that caching, often a default scaling solution, is a major cause of catastrophic "metastable" system failures. It delves into the importance of deep postmortem analysis, moving beyond superficial root causes to question observability, testing, and fundamental architectural assumptions. Listeners will learn how unquestioning reliance on caching can create systems prone to persistent, unrecoverable breakdowns.

Apr 20, 2026

17m
9

The Interface Tax: Is Clean Architecture a Scam?

This episode critically explores how dogmatic adherence to "Clean Architecture" principles, such as excessive layering and abstraction, can inadvertently hinder development velocity. It introduces concepts like the "Interface Tax" and "Lasagna Code," illustrating how over-engineering for unlikely future changes creates unnecessary complexity and friction for developers. Listeners will gain a critical perspective on common architectural practices and learn to identify when they might be detrimental to project progress.

Apr 10, 2026

14m
8

From Vibe-Coded to Enterprise: Handing the Pager to Claude

This episode explores Incident.io's new remote Model Context Protocol (MCP) server, which enables AI assistants like Claude to directly access and interact with live production incident data. Listeners will learn how this "USB-C for AI" standard aims to reduce "dashboard fatigue" and streamline incident response by providing consolidated information, while also considering the potential trade-offs regarding deep system understanding and the "vibe-coded" origin of the technology.

Apr 3, 2026

18m
7

The Microservice Hangover: Investigating an 83% Cost Cut by Returning to a "Majestic Monolith"

This episode discusses a team's successful transition from microservices back to a monolithic architecture, resulting in an 83% reduction in infrastructure costs and a 61% reduction in codebase. It critically examines the common trend of smaller engineering teams adopting microservices due to "cargo culting" and highlights how this can lead to engineers spending excessive time on infrastructure rather than product features. Listeners will learn about the potential pitfalls of prematurely adopting complex distributed systems and the surprising benefits a well-managed monolith can offer for productivity and cost efficiency.

Mar 31, 2026

17m
6

The Trojan Horse in the AI Stack: How One Tiny Library Exposed the Keys to the Kingdom

This episode explores a critical supply chain attack where malicious code was embedded in legitimate updates of the popular LiteLLM library on PyPI, causing system meltdowns and stealing sensitive credentials like SSH keys and cloud configurations. Listeners will learn how such attacks exploit trusted open-source dependencies to compromise critical infrastructure and why libraries that handle numerous API keys for services like Large Language Models are particularly attractive targets for attackers.

Mar 27, 2026

13m
5

The Slow-Motion Failure: Deconstructing the March 2026 Claude Outages

This episode discusses a March 2026 outage of the Claude AI platform, revealing that the failure wasn't in the AI models themselves but in the "control plane" — critical non-AI components like authentication services. Listeners will learn how an unanticipated surge in new user sign-ups overwhelmed these "boring" but essential systems, highlighting the often-overlooked challenges of scaling stateful infrastructure compared to the AI's "inference plane."

Mar 20, 2026

13m
4

The Shadow Workforce: Rise of the In-House AI Coder

This episode explores the rapid adoption of AI in software development, revealing how companies like Ramp and StrongDM are using AI to author significant code, with some even eliminating human review. It delves into why elite organizations build custom AI agents for deep integration into their proprietary systems, contrasting this with a "radical" approach that prioritizes behavioral validation over human oversight. Listeners will gain insight into the philosophical debates surrounding AI-generated code and the emerging architectural patterns for these autonomous systems.

Mar 19, 2026

16m
3

The Rich Get Richer: Is AI Making Your Senior Engineers 10x and Your Juniors Obsolete?

This episode challenges the common belief that AI will level the playing field for developers, presenting data that shows it disproportionately benefits senior engineers. Listeners will learn that experienced developers use AI as a force multiplier, leveraging their deep architectural context to direct and curate AI-generated code, thus widening the productivity gap with junior developers. This has significant implications for how engineering teams are trained, mentored, and staffed.

Mar 13, 2026

18m
2

Atlassian's AI Sacrifice: Firing Engineers to Hire "AI Talent"

This episode explores Atlassian's recent layoff of 1600 employees, including over 900 in R&D, as a strategic pivot to "self-fund further investment in AI." Listeners will learn about the significant financial implications of this move, the controversial method of employee notification, and how the company is sacrificing institutional knowledge and restructuring leadership in a calculated bet on future AI capabilities.

Mar 12, 2026

15m
1

Matt Pocock: 9 Ways AI Coding Rewired My Brain

This episode explores how one developer's 100% AI-contributed software development process has fundamentally reshaped his approach, particularly by increasing his focus on robust integration testing. Listeners will learn that immediate, comprehensive feedback loops—including "desirable friction" like strong type checking and rapid local testing environments—are crucial for effectively guiding AI agents. The discussion also highlights AI's current limitations, such as its lack of "taste" for UI design.

Mar 12, 2026

22m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

Software engineering war stories, architecture decisions, and lessons learned.

URL copied to clipboard!

Debug Log: The Million-Goroutine Memory Leak and the Case for "Boring" Auth

Chasing the Cart: Why Pinterest Ripped Out Its Sequential Ad Architecture

The Blast Radius of Agentic AI: Why "Five Nines" is a Relic

Phantom in the Page Cache: Unpacking the 10-Line "Copy Fail" Exploit

Automating the Autopsy: The Promise and Peril of AI-Generated Postmortems

The Harness and the Lobotomy: Unpacking Anthropic’s 47-Day Degradation

Scaling for Ghosts: 7 Microservices, 47 Users, and the Trap of Resume-Driven Development

The 3,000 Incident Postmortem: Why Caches Are Actually the Enemy

The Interface Tax: Is Clean Architecture a Scam?

From Vibe-Coded to Enterprise: Handing the Pager to Claude

The Microservice Hangover: Investigating an 83% Cost Cut by Returning to a "Majestic Monolith"

The Trojan Horse in the AI Stack: How One Tiny Library Exposed the Keys to the Kingdom

The Slow-Motion Failure: Deconstructing the March 2026 Claude Outages

The Shadow Workforce: Rise of the In-House AI Coder

The Rich Get Richer: Is AI Making Your Senior Engineers 10x and Your Juniors Obsolete?

Atlassian's AI Sacrifice: Firing Engineers to Hire "AI Talent"

Matt Pocock: 9 Ways AI Coding Rewired My Brain

Authentication Required