How SRE Teams Use Game Days to Build Muscle Memory for Incidents episode artwork

EPISODE · Jun 2, 2026 · 8 MIN

How SRE Teams Use Game Days to Build Muscle Memory for Incidents

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

In Episode 26 of The Site Reliability Podcast, Lucas and Luna explore how SRE teams run 'game days' — simulated incident exercises — to build muscle memory and reduce panic during real outages. They break down how Etsy, a pioneer in game days, structures its exercises using realistic scenarios, mini-game design, and post-mortem debriefs without blame. The hosts discuss the difference between chaos engineering and game days, how to avoid making exercises feel like busywork, and why even small teams can run a low-stakes simulation with nothing more than a staging environment and a script. Lucas shares concrete steps: start with a single failure mode, assign roles, time-box the response, and debrief with the question 'What did we learn?' rather than 'Who messed up?' The episode also touches on the metrics that matter — mean time to acknowledge, mean time to resolve, and how game days improve both without requiring expensive tools. If today's conversation gave you something usable, they mention the listener-supported model that keeps the show ad-free, with a link at buy me a coffee dot com slash fexingo. No fluff, just practical SRE discipline. #SiteReliabilityEngineering #GameDays #IncidentResponse #ChaosEngineering #Etsy #SRE #ToilReduction #MuscleMemory #OnCall #Postmortem #ReliabilityEngineering #DevOps #IncidentManagement #FaultInjection #SyntheticMonitoring #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

In Episode 26 of The Site Reliability Podcast, Lucas and Luna explore how SRE teams run 'game days' — simulated incident exercises — to build muscle memory and reduce panic during real outages. They break down how Etsy, a pioneer in game days, structures its exercises using realistic scenarios, mini-game design, and post-mortem debriefs without blame. The hosts discuss the difference between chaos engineering and game days, how to avoid making exercises feel like busywork, and why even small teams can run a low-stakes simulation with nothing more than a staging environment and a script. Lucas shares concrete steps: start with a single failure mode, assign roles, time-box the response, and debrief with the question 'What did we learn?' rather than 'Who messed up?' The episode also touches on the metrics that matter — mean time to acknowledge, mean time to resolve, and how game days improve both without requiring expensive tools. If today's conversation gave you something usable, they mention the listener-supported model that keeps the show ad-free, with a link at buy me a coffee dot com slash fexingo. No fluff, just practical SRE discipline. #SiteReliabilityEngineering #GameDays #IncidentResponse #ChaosEngineering #Etsy #SRE #ToilReduction #MuscleMemory #OnCall #Postmortem #ReliabilityEngineering #DevOps #IncidentManagement #FaultInjection #SyntheticMonitoring #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use Game Days to Build Muscle Memory for Incidents

0:00 8:13

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 8 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on June 2, 2026.

What is this episode about?

In Episode 26 of The Site Reliability Podcast, Lucas and Luna explore how SRE teams run 'game days' — simulated incident exercises — to build muscle memory and reduce panic during real outages. They break down how Etsy, a pioneer in game days,...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!