EPISODE · Jun 12, 2026 · 7 MIN
How SRE Teams Use SLIs to Define Reliability
from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo
In this episode of The Site Reliability Podcast, Lucas and Luna dive into the often-overlooked first step of SRE practice: defining Service Level Indicators (SLIs). They explore how vague uptime percentages fail to capture user experience and walk through a concrete example from a major streaming platform that shifted from a 'five nines' target to a more granular SLI based on video start latency. The hosts discuss common pitfalls like measuring everything versus measuring what matters, the tension between signal and noise, and how a well-defined SLI transforms the conversation from 'is the site up?' to 'are users happy?' Listeners will learn why three specific SLIs—latency, error rate, and throughput—cover most production services, and how to avoid the trap of vanity metrics. The episode closes on a forward-looking note about how LLM-based systems challenge traditional SLI thinking. #SLI #ServiceLevelIndicators #SRE #SiteReliabilityEngineering #Uptime #Latency #ErrorRate #Throughput #Observability #UserExperience #ProductionEngineering #ReliabilityMetrics #Fexingo #FexingoBusiness #BusinessPodcast #Technology #DevOps #IncidentResponse Keep every episode free: buymeacoffee.com/fexingo
What this episode covers
In this episode of The Site Reliability Podcast, Lucas and Luna dive into the often-overlooked first step of SRE practice: defining Service Level Indicators (SLIs). They explore how vague uptime percentages fail to capture user experience and walk through a concrete example from a major streaming platform that shifted from a 'five nines' target to a more granular SLI based on video start latency. The hosts discuss common pitfalls like measuring everything versus measuring what matters, the tension between signal and noise, and how a well-defined SLI transforms the conversation from 'is the site up?' to 'are users happy?' Listeners will learn why three specific SLIs—latency, error rate, and throughput—cover most production services, and how to avoid the trap of vanity metrics. The episode closes on a forward-looking note about how LLM-based systems challenge traditional SLI thinking. #SLI #ServiceLevelIndicators #SRE #SiteReliabilityEngineering #Uptime #Latency #ErrorRate #Throughput #Observability #UserExperience #ProductionEngineering #ReliabilityMetrics #Fexingo #FexingoBusiness #BusinessPodcast #Technology #DevOps #IncidentResponse Keep every episode free: buymeacoffee.com/fexingo
NOW PLAYING
How SRE Teams Use SLIs to Define Reliability
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m