How SRE Teams Use SLIs to Define Reliability episode artwork

EPISODE · Jun 12, 2026 · 7 MIN

How SRE Teams Use SLIs to Define Reliability

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

In this episode of The Site Reliability Podcast, Lucas and Luna dive into the often-overlooked first step of SRE practice: defining Service Level Indicators (SLIs). They explore how vague uptime percentages fail to capture user experience and walk through a concrete example from a major streaming platform that shifted from a 'five nines' target to a more granular SLI based on video start latency. The hosts discuss common pitfalls like measuring everything versus measuring what matters, the tension between signal and noise, and how a well-defined SLI transforms the conversation from 'is the site up?' to 'are users happy?' Listeners will learn why three specific SLIs—latency, error rate, and throughput—cover most production services, and how to avoid the trap of vanity metrics. The episode closes on a forward-looking note about how LLM-based systems challenge traditional SLI thinking. #SLI #ServiceLevelIndicators #SRE #SiteReliabilityEngineering #Uptime #Latency #ErrorRate #Throughput #Observability #UserExperience #ProductionEngineering #ReliabilityMetrics #Fexingo #FexingoBusiness #BusinessPodcast #Technology #DevOps #IncidentResponse Keep every episode free: buymeacoffee.com/fexingo

In this episode of The Site Reliability Podcast, Lucas and Luna dive into the often-overlooked first step of SRE practice: defining Service Level Indicators (SLIs). They explore how vague uptime percentages fail to capture user experience and walk through a concrete example from a major streaming platform that shifted from a 'five nines' target to a more granular SLI based on video start latency. The hosts discuss common pitfalls like measuring everything versus measuring what matters, the tension between signal and noise, and how a well-defined SLI transforms the conversation from 'is the site up?' to 'are users happy?' Listeners will learn why three specific SLIs—latency, error rate, and throughput—cover most production services, and how to avoid the trap of vanity metrics. The episode closes on a forward-looking note about how LLM-based systems challenge traditional SLI thinking. #SLI #ServiceLevelIndicators #SRE #SiteReliabilityEngineering #Uptime #Latency #ErrorRate #Throughput #Observability #UserExperience #ProductionEngineering #ReliabilityMetrics #Fexingo #FexingoBusiness #BusinessPodcast #Technology #DevOps #IncidentResponse Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use SLIs to Define Reliability

0:00 7:12

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 7 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on June 12, 2026.

What is this episode about?

In this episode of The Site Reliability Podcast, Lucas and Luna dive into the often-overlooked first step of SRE practice: defining Service Level Indicators (SLIs). They explore how vague uptime percentages fail to capture user experience and walk...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!