How SRE Teams Use Canary Deployments to Reduce Release Risk episode artwork

EPISODE · Jun 9, 2026 · 8 MIN

How SRE Teams Use Canary Deployments to Reduce Release Risk

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

Lucas and Luna dive into canary deployments: the practice of routing a small percentage of production traffic to a new version before rolling it out broadly. Lucas explains why Netflix's 'canary clusters' and Etsy's 'feature flipping' approach revolutionized how SRE teams think about release risk, and contrasts it with the old all-at-once deploys that caused major incidents. They discuss specific strategies: using metrics comparison between canary and baseline, automatic rollback triggers, and the trade-off between speed and safety. Luna brings up the 2023 incident where a mismatched canary size led to a slow-burn outage, and they explore how teams decide on canary percentage and duration. A concrete episode for any engineer or manager responsible for production releases. #SiteReliabilityEngineering #CanaryDeployments #ReleaseManagement #ProductionEngineering #IncidentPrevention #Netflix #Etsy #ContinuousDelivery #SRE #Uptime #ReliabilityEngineering #DeploymentStrategies #Technology #FexingoBusiness #BusinessPodcast #SoftwareEngineering #DevOps #RiskMitigation Keep every episode free: buymeacoffee.com/fexingo

Lucas and Luna dive into canary deployments: the practice of routing a small percentage of production traffic to a new version before rolling it out broadly. Lucas explains why Netflix's 'canary clusters' and Etsy's 'feature flipping' approach revolutionized how SRE teams think about release risk, and contrasts it with the old all-at-once deploys that caused major incidents. They discuss specific strategies: using metrics comparison between canary and baseline, automatic rollback triggers, and the trade-off between speed and safety. Luna brings up the 2023 incident where a mismatched canary size led to a slow-burn outage, and they explore how teams decide on canary percentage and duration. A concrete episode for any engineer or manager responsible for production releases. #SiteReliabilityEngineering #CanaryDeployments #ReleaseManagement #ProductionEngineering #IncidentPrevention #Netflix #Etsy #ContinuousDelivery #SRE #Uptime #ReliabilityEngineering #DeploymentStrategies #Technology #FexingoBusiness #BusinessPodcast #SoftwareEngineering #DevOps #RiskMitigation Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use Canary Deployments to Reduce Release Risk

0:00 8:32

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 8 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on June 9, 2026.

What is this episode about?

Lucas and Luna dive into canary deployments: the practice of routing a small percentage of production traffic to a new version before rolling it out broadly. Lucas explains why Netflix's 'canary clusters' and Etsy's 'feature flipping' approach...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!