How SRE Teams Use Observability to Find Unknown Unknowns episode artwork

EPISODE · Jun 11, 2026 · 10 MIN

How SRE Teams Use Observability to Find Unknown Unknowns

from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo

Episode 45 of The Site Reliability Podcast digs into observability—how modern SRE teams go beyond monitoring to discover the 'unknown unknowns' that cause the worst outages. Lucas and Luna break down the difference between watching known metrics (CPU, memory) and exploring unknown failure modes with structured events and high-cardinality data. They walk through a real example: a major e-commerce platform that lost $340,000 in seven minutes during a 2023 flash sale because their monitoring didn't catch a latency spike in a new authentication microservice. They explain how distributed tracing and log-based metrics surfaced the root cause after the fact, and how the team now uses observability-driven dashboards to spot anomalies before they become incidents. The episode also covers practical steps—start with one service, instrument with OpenTelemetry, and build a culture of exploration—so listeners can apply observability in their own SRE practice. No ads, just actionable engineering insights. #Observability #SRE #SiteReliabilityEngineering #Monitoring #DistributedTracing #OpenTelemetry #IncidentResponse #UnknownUnknowns #HighCardinality #Microservices #Latency #LogBasedMetrics #ServiceLevelObjectives #ChaosEngineering #ProductionEngineering #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

Episode 45 of The Site Reliability Podcast digs into observability—how modern SRE teams go beyond monitoring to discover the 'unknown unknowns' that cause the worst outages. Lucas and Luna break down the difference between watching known metrics (CPU, memory) and exploring unknown failure modes with structured events and high-cardinality data. They walk through a real example: a major e-commerce platform that lost $340,000 in seven minutes during a 2023 flash sale because their monitoring didn't catch a latency spike in a new authentication microservice. They explain how distributed tracing and log-based metrics surfaced the root cause after the fact, and how the team now uses observability-driven dashboards to spot anomalies before they become incidents. The episode also covers practical steps—start with one service, instrument with OpenTelemetry, and build a culture of exploration—so listeners can apply observability in their own SRE practice. No ads, just actionable engineering insights. #Observability #SRE #SiteReliabilityEngineering #Monitoring #DistributedTracing #OpenTelemetry #IncidentResponse #UnknownUnknowns #HighCardinality #Microservices #Latency #LogBasedMetrics #ServiceLevelObjectives #ChaosEngineering #ProductionEngineering #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

NOW PLAYING

How SRE Teams Use Observability to Find Unknown Unknowns

0:00 10:06

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering?

This episode is 10 minutes long.

When was this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode published?

This episode was published on June 11, 2026.

What is this episode about?

Episode 45 of The Site Reliability Podcast digs into observability—how modern SRE teams go beyond monitoring to discover the 'unknown unknowns' that cause the worst outages. Lucas and Luna break down the difference between watching known metrics...

Can I download this The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!