How SRE Teams Use Capacity Planning to Prevent Black Friday Outages episode artwork

EPISODE · May 29, 2026 · 8 MIN

How SRE Teams Use Capacity Planning to Prevent Black Friday Outages

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

In this episode, Lucas and Luna explore how site reliability engineering teams use capacity planning to avoid catastrophic outages during peak traffic events like Black Friday and Cyber Monday. They break down the specific methodology used by major e-commerce platforms, including the concept of 'headroom targets' and 'traffic shaping' — techniques that go beyond simple auto-scaling. Lucas explains how teams model demand using historical data and synthetic load testing, and why many companies still get caught off-guard by the 'thundering herd' problem. The conversation also covers real-world examples from retailers who learned hard lessons after underestimating mobile traffic surges. Luna challenges the common assumption that more servers is always the answer, and they discuss the trade-offs between cost optimization and reliability. A must-listen for engineers, SREs, and anyone responsible for keeping services running under extreme load. #CapacityPlanning #BlackFriday #SiteReliabilityEngineering #SRE #TrafficSurge #AutoScaling #HeadroomTargets #ThunderingHerd #LoadTesting #Ecommerce #Uptime #IncidentPrevention #CloudInfrastructure #ProductionEngineering #FexingoBusiness #Technology #BusinessPodcast #TheSiteReliabilityPodcast Keep every episode free: buymeacoffee.com/fexingo

In this episode, Lucas and Luna explore how site reliability engineering teams use capacity planning to avoid catastrophic outages during peak traffic events like Black Friday and Cyber Monday. They break down the specific methodology used by major e-commerce platforms, including the concept of 'headroom targets' and 'traffic shaping' — techniques that go beyond simple auto-scaling. Lucas explains how teams model demand using historical data and synthetic load testing, and why many companies still get caught off-guard by the 'thundering herd' problem. The conversation also covers real-world examples from retailers who learned hard lessons after underestimating mobile traffic surges. Luna challenges the common assumption that more servers is always the answer, and they discuss the trade-offs between cost optimization and reliability. A must-listen for engineers, SREs, and anyone responsible for keeping services running under extreme load. #CapacityPlanning #BlackFriday #SiteReliabilityEngineering #SRE #TrafficSurge #AutoScaling #HeadroomTargets #ThunderingHerd #LoadTesting #Ecommerce #Uptime #IncidentPrevention #CloudInfrastructure #ProductionEngineering #FexingoBusiness #Technology #BusinessPodcast #TheSiteReliabilityPodcast Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use Capacity Planning to Prevent Black Friday Outages

0:00 8:45

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 8 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on May 29, 2026.

What is this episode about?

In this episode, Lucas and Luna explore how site reliability engineering teams use capacity planning to avoid catastrophic outages during peak traffic events like Black Friday and Cyber Monday. They break down the specific methodology used by major...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!