EPISODE · May 26, 2026 · 10 MIN
How Microsoft SREs Automate Capacity Planning at Cloud Scale
from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo
Episode 13 of The Site Reliability Podcast explores how Microsoft's SRE teams automate capacity planning to keep Azure running smoothly despite unpredictable demand. Lucas and Luna break down the three-layer approach — demand forecasting, headroom management, and autoscaling — and walk through a real case where a retail giant's Black Friday traffic spike was absorbed without a single incident. They discuss the tension between efficiency and resilience, how SREs use historical traffic patterns and machine learning to predict compute needs, and why over-provisioning isn't always the answer. Listeners will learn how capacity planning has evolved from a manual quarterly spreadsheet exercise into a continuous, automated feedback loop — and why that shift is critical for any organization running infrastructure at scale. #SRE #CapacityPlanning #Azure #Microsoft #CloudComputing #Autoscaling #DemandForecasting #SiteReliabilityEngineering #IncidentPrevention #BlackFriday #Retail #MachineLearning #Observability #Uptime #Infrastructure #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
What this episode covers
Episode 13 of The Site Reliability Podcast explores how Microsoft's SRE teams automate capacity planning to keep Azure running smoothly despite unpredictable demand. Lucas and Luna break down the three-layer approach — demand forecasting, headroom management, and autoscaling — and walk through a real case where a retail giant's Black Friday traffic spike was absorbed without a single incident. They discuss the tension between efficiency and resilience, how SREs use historical traffic patterns and machine learning to predict compute needs, and why over-provisioning isn't always the answer. Listeners will learn how capacity planning has evolved from a manual quarterly spreadsheet exercise into a continuous, automated feedback loop — and why that shift is critical for any organization running infrastructure at scale. #SRE #CapacityPlanning #Azure #Microsoft #CloudComputing #Autoscaling #DemandForecasting #SiteReliabilityEngineering #IncidentPrevention #BlackFriday #Retail #MachineLearning #Observability #Uptime #Infrastructure #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
NOW PLAYING
How Microsoft SREs Automate Capacity Planning at Cloud Scale
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m