EPISODE · Jun 17, 2026 · 9 MIN
How SRE Teams Use Cost of Delay to Prioritize Reliability Work
from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo
Lucas and Luna explore how SRE teams at companies like Spotify and Etsy use 'cost of delay' — a concept borrowed from product management — to quantify the business impact of reliability work. Lucas explains the math behind deferring a reliability project, using a real-world example: a payment-processing team deciding whether to fix a latency issue or build a new feature. Luna pushes back on the difficulty of estimating delay costs, and they discuss a practical framework — weighted shortest job first (WSJF) — that helps teams rank reliability initiatives alongside feature work. The episode includes a concrete example: if deferring an SRE project by one quarter costs $200,000 in incident-related losses, the team can calculate the cost of delay per week and compare it to the effort required. Listeners learn how to present reliability investments in the language executives understand: dollars and time. The conversation closes with a reflection on how cost of delay changes the conversation from 'how reliable should we be?' to 'what happens if we defer this work?' #SiteReliabilityEngineering #CostOfDelay #WSJF #Spotify #Etsy #SREPrioritization #ReliabilityEngineering #IncidentResponse #Technology #BusinessCase #ProductManagement #WeightedShortestJobFirst #SREMetrics #LatencyOptimization #FexingoBusiness #BusinessPodcast #TechPodcast #SREPodcast Keep every episode free: buymeacoffee.com/fexingo
What this episode covers
Lucas and Luna explore how SRE teams at companies like Spotify and Etsy use 'cost of delay' — a concept borrowed from product management — to quantify the business impact of reliability work. Lucas explains the math behind deferring a reliability project, using a real-world example: a payment-processing team deciding whether to fix a latency issue or build a new feature. Luna pushes back on the difficulty of estimating delay costs, and they discuss a practical framework — weighted shortest job first (WSJF) — that helps teams rank reliability initiatives alongside feature work. The episode includes a concrete example: if deferring an SRE project by one quarter costs $200,000 in incident-related losses, the team can calculate the cost of delay per week and compare it to the effort required. Listeners learn how to present reliability investments in the language executives understand: dollars and time. The conversation closes with a reflection on how cost of delay changes the conversation from 'how reliable should we be?' to 'what happens if we defer this work?' #SiteReliabilityEngineering #CostOfDelay #WSJF #Spotify #Etsy #SREPrioritization #ReliabilityEngineering #IncidentResponse #Technology #BusinessCase #ProductManagement #WeightedShortestJobFirst #SREMetrics #LatencyOptimization #FexingoBusiness #BusinessPodcast #TechPodcast #SREPodcast Keep every episode free: buymeacoffee.com/fexingo
NOW PLAYING
How SRE Teams Use Cost of Delay to Prioritize Reliability Work
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m