Reliability Rebels

PODCAST · technology

Reliability Rebels

The Reliability Rebels Podcast explores making software and systems more reliable by challenging the status quo. We sometimes have to challenge past decisions, existing technology, and even company culture when improving how we run production. This podcast will explore real-life examples from our guests and reveal insights and techniques applicable to your career and team. Intended audience- humans in the tech industry, especially software engineers and their leaders, product managers, and DevOps/Site Reliability Engineering practitioners.

  1. 12

    Episode 12: Amin Astaneh

    In this special episode, the tables are turned — host Amin Astaneh becomes the guest. Stephen Townshend, host of Slight Reliability, interviews Amin about rebuilding his life as a nomad: traveling North America, living out of a pickup truck, and running his business from the road. A personal conversation about freedom, simplicity, and the human side of a life in tech. Show Notes Available at https://podcast.certomodo.io/amin-astaneh.html.

  2. 11

    Episode 11: Sylvain Kalache

    AI agents are triaging incidents and writing runbooks- but are LLMs actually the right tool for operational work? Sylvain Kalache, Head of AI Labs at Rootly, shares research on where AI SRE tools add real value, where they fall apart, and what it means for operational maturity when humans only see the hardest problems. Guest: Sylvain Kalache, Head of AI Labs at Rootly (https://rootly.com). Show Notes Available at https://podcast.certomodo.io/sylvain-kalache.html.

  3. 10

    Episode 10: Kyle Forster

    Explores the 'AI code tsunami' and how massive, AI-generated code changes are forcing engineering teams to rethink traditional code reviews, observability, and the future of SRE roles. The conversation highlights a shift toward treating test environments like production and using narrowly scoped AI agents to manage system reliability, guided by simplified, binary SLIs and SLOs. Guest: Kyle Forster, founder and CEO of RunWhen (https://runwhen.com). Show Notes Available at https://podcast.certomodo.io/kyle-forster.html.

  4. 9

    Episode 9: Jon Reeve

    Discusses the 'complexity cult' of the current observability industry, how the open-source TUI tool Gonzo can reveal infrastructure insights using novel use of LLMs for sentiment analysis, and the vision of more accessible observability experiences for software engineers. Guest: Jon Reeve, founder and CPO of ControlTheory (controltheory.com). Show Notes Available at https://podcast.certomodo.io/jon-reeve.html.

  5. 8

    Episode 8: Aaron 'Checo' Pacheco

    Explores monitoring and observability evolution, examining how observability costs now consume 15-25% of infrastructure budgets with Aaron Pacheco from Ottermon.ai. Show Notes Available at https://podcast.certomodo.io/aaron-pacheco.html.

  6. 7

    Episode 7: Sebastian Vietz

    Discusses how naming conventions shape industry perceptions, with focus on AI SRE terminology with Sebastian Vietz from Compass Digital. Show Notes Available at https://podcast.certomodo.io/sebastian-vietz.html.

  7. 6

    Episode 6: Chris Evans

    Explores whether automation through AI actually reduces toil or just shifts it elsewhere with Chris Evans from Incident.io. Show Notes Available at https://podcast.certomodo.io/chris-evans.html.

  8. 5

    Episode 5: Derek Brown

    Compares infrastructure management at large tech companies versus smaller organizations with Derek Brown from Plaid. Show Notes Available at https://podcast.certomodo.io/derek-brown.html.

  9. 4

    Episode 4: Kat Gaines

    Examines incident management beyond technical fixes, emphasizing communication and customer experience with Kat Gaines from PagerDuty. Show Notes Available at https://podcast.certomodo.io/kat-gaines.html.

  10. 3

    Episode 3: Michael Abed

    Chronicles resolving a complex production incident at Meta lasting over three days with Michael Abed from Datadog. Show Notes Available at https://podcast.certomodo.io/michael-abed.html.

  11. 2

    Episode 2: Ricardo Amaro

    Reflects on early DevOps initiatives at Acquia with Ricardo Amaro, who authored a chapter on ML-driven capacity planning in Seeking SRE. Show Notes Available at https://podcast.certomodo.io/ricardo-amaro.html.

  12. 1

    Episode 1: Rick Gorman

    Inaugural episode exploring technical debt, testing approaches, and blameless culture with software engineer Rick Gorman. Show Notes Available at https://podcast.certomodo.io/rick-gorman.html.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

The Reliability Rebels Podcast explores making software and systems more reliable by challenging the status quo. We sometimes have to challenge past decisions, existing technology, and even company culture when improving how we run production. This podcast will explore real-life examples from our guests and reveal insights and techniques applicable to your career and team. Intended audience- humans in the tech industry, especially software engineers and their leaders, product managers, and DevOps/Site Reliability Engineering practitioners.

HOSTED BY

Amin Astaneh

URL copied to clipboard!