All Episodes
Reliability Enablers — 70 episodes
You (and AI) can't automate reliability away
#67 Why the SRE Book Fails Most Orgs — Lessons from a Google Veteran
#66 - Unpacking 2025 SRE Report’s Damning Findings
#65 - In Critical Systems, 99.9% Isn’t Reliable — It’s a Liability
#64 - Using AI to Reduce Observability Costs
#63 - Does "Big Observability" Neglect Mobile?
#62 - Early Youtube SRE shares Modern Reliability Strategy
#61 Scott Moore on SRE, Performance Engineering, and More
#60 How to NOT fail in Platform Engineering
#59 Who handles monitoring in your team and how?
#58 Fixing Monitoring's Bad Signal-to-Noise Ratio
#57 How Technical Leads Support Software Reliability
#56 Resolving DORA Metrics Mistakes
#55 3 Uses for Monitoring Data Other Than Alerts and Dashboards
#54 Becoming a Valuable Engineer Without Sacrificing Your Sanity
#53 What's Missing in Incident Response Processes?
Can ITIL Benefit from Site Reliability Engineering?
#52 Navigating Complexity within Incidents
#51 Whitebox vs Blackbox Monitoring
#50 Making Better Sense of Observability Data
#49 Alert Fatigue is Still an Issue - Here's How We Fix it
#48 Cutting Down "Toil" aka Manual Work in Software
#47 How to Grow Team Impact Through Learning Culture
#46 Platform Team Design According to Team Team Topologies
#45 How Team Topologies Can Guide Enabling Teams
#44 - Making SLOs Matter to Stakeholders
#43 - SLOs: a Deeper Dive into its Mechanics
#42 - Hitting Software SLA Targets through SLOs and SLIs
#41 Curbing High Observability Costs
#40 How to Enable Observability for Success
#39 How Chaos Engineering Helps Reduce Incident Risk
#38 The Real Cost of Software Reliability & Downtime
#37 An SRE Approach to Managing Technology Risk
#36 Avoiding Critical Platform Engineering Mistakes
#35 Boosting Your Observability Data's Usability
#34 From Cloud to Concrete: Should You Return to On-Prem?
#33 Inside Google's Data Center Design
#32 Clarifying Platform Engineering's Role (with Ajay Chankramath) BONUS EP
#31 Introduction to FinOps (with Ajay Chankramath)
#30 Clearing Delusions in Observability (with David Caudill)
#29 - Reacting to Google's SRE book 2016 (Chapter 1 Part 2)
#28 - Reacting to Google's SRE Book 2016 (Chapter 1 Part 1)
#27 - Growing as a Site Reliability Engineer (Part 3)
#26 - Growing as a Site Reliability Engineer (Part 2)
#25 - DORA and the Pursuit of Engineering Excellence (with Tim Wheeler)
#24 - Growing as a Site Reliability Engineer (Part 1)
#23 - The Danger of Unreliable Platforms (with Jade Rubick)
#22 - How Google does SRE Consulting (with Yury Niño Roa)
#21 - Better SRE in 2024 is all we can hope for
#20 Holiday Special with Stephen Townshend
#19 How to Develop Early Career Engineers (with John Hyland)
#18 Winning at SRE in Banking and Telecom (with Troy Koss)
#17 Lessons from SRE's Wild West Days (with Rick Boone)
#16 Acing Cloud Infra in Digital Media Giant (with Sreejith Chelanchery)
#15 Growing Reliability Engineering Across 5+ Companies (with Nash Seshan)
#14 Faster Incident Resolution through Data-Driven Notebooks (with Ivan Merrill)
#13 Making Sense of OpenTelemetry and Observability (with Adriana Villela)
#12 From Incident Firefighting to Reliability First (with Robert Ross)
#11 Rising to Staff Engineer in DevOps and SRE (with Rajesh Reddy N)
#10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)
#9 Inside Booking.com's Site Reliability Engineering practice (with Samuele Tonon and Yoann Fouquet)
#8 Software Reliability Ninja Who is NOT an SRE (with Pablo Bouzada)
What happened to the podcast?
#7 Bringing HR onboard with SRE hiring and onboarding
#6 Building a successful SRE practice through capabilities
#5 Where does SRE fit into your organization's structure?
#4 Should organizations care about SRE?
#3 SRE vs DevOps vs Platform Engineering
#2 What is Site Reliability Engineering (SRE) and what is not SRE?
#1 Introducing the SREpath podcast