How SRE Teams Use Chaos Engineering to Test Resilience episode artwork

EPISODE · Jun 9, 2026 · 10 MIN

How SRE Teams Use Chaos Engineering to Test Resilience

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

In episode 40 of The Site Reliability Podcast, Lucas and Luna dive into chaos engineering — the practice of intentionally breaking systems to find weaknesses before real incidents strike. They explore how Netflix pioneered the approach with Chaos Monkey, the lessons SRE teams can learn from controlled failure experiments, and how to start small with simple game days that simulate a database partition or a DNS failure. Lucas breaks down the difference between load testing and chaos testing, and why the goal isn't to break everything but to build confidence in your system's ability to recover. They also discuss common pitfalls like running experiments during peak traffic or without proper observability in place. Whether you're a seasoned SRE or just starting to think about resilience, this episode gives you a concrete framework for making your systems more robust — one controlled explosion at a time. Plus, Lucas and Luna explain why keeping this podcast ad-free matters and how listener support makes it possible. #ChaosEngineering #SRE #SiteReliabilityEngineering #Netflix #ChaosMonkey #ResilienceTesting #FailureInjection #ProductionTesting #Uptime #IncidentResponse #Observability #GameDays #FaultTolerance #CloudEngineering #DevOps #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

In episode 40 of The Site Reliability Podcast, Lucas and Luna dive into chaos engineering — the practice of intentionally breaking systems to find weaknesses before real incidents strike. They explore how Netflix pioneered the approach with Chaos Monkey, the lessons SRE teams can learn from controlled failure experiments, and how to start small with simple game days that simulate a database partition or a DNS failure. Lucas breaks down the difference between load testing and chaos testing, and why the goal isn't to break everything but to build confidence in your system's ability to recover. They also discuss common pitfalls like running experiments during peak traffic or without proper observability in place. Whether you're a seasoned SRE or just starting to think about resilience, this episode gives you a concrete framework for making your systems more robust — one controlled explosion at a time. Plus, Lucas and Luna explain why keeping this podcast ad-free matters and how listener support makes it possible. #ChaosEngineering #SRE #SiteReliabilityEngineering #Netflix #ChaosMonkey #ResilienceTesting #FailureInjection #ProductionTesting #Uptime #IncidentResponse #Observability #GameDays #FaultTolerance #CloudEngineering #DevOps #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use Chaos Engineering to Test Resilience

0:00 10:50

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 10 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on June 9, 2026.

What is this episode about?

In episode 40 of The Site Reliability Podcast, Lucas and Luna dive into chaos engineering — the practice of intentionally breaking systems to find weaknesses before real incidents strike. They explore how Netflix pioneered the approach with Chaos...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!