EPISODE · Jun 10, 2026 · 9 MIN
How SRE Teams Use toil budgets to prioritize automation
from The Site Reliability Podcast with Fexingo: SRE, Uptime, and Production Engineering · host Fexingo
Episode 43 of The Site Reliability Podcast. Lucas and Luna explore how SRE teams are adopting 'toil budgets' — a concept inspired by error budgets — to cap the amount of manual, repetitive work engineers do each sprint. They break down Google's internal definition of toil (hands-on work with no enduring value), how a toil budget works alongside an error budget, and a concrete case from a mid-sized SaaS company that cut toil from 40% to 15% of engineering time over six months using a simple spreadsheet-based tracking system. Lucas shares the specific criteria for classifying toil, the formula for setting the budget as a percentage of total effort, and the governance process — a weekly toil review board — that prevented scope creep. Luna pushes back on whether toil budgets just push work onto other teams, and Lucas explains the 'clean-up after yourself' rule that prevents that. The episode closes with a practical tip: start by running a three-week time diary before imposing any budget. No marketing fluff. #ToilBudget #SRE #SiteReliabilityEngineering #Automation #GoogleSRE #IncidentResponse #Productivity #EngineeringCulture #DevOps #TechOps #WorkflowAutomation #Observability #FexingoBusiness #BusinessPodcast #Technology #Infrastructure #ToilReduction #SprintPlanning Keep every episode free: buymeacoffee.com/fexingo
What this episode covers
Episode 43 of The Site Reliability Podcast. Lucas and Luna explore how SRE teams are adopting 'toil budgets' — a concept inspired by error budgets — to cap the amount of manual, repetitive work engineers do each sprint. They break down Google's internal definition of toil (hands-on work with no enduring value), how a toil budget works alongside an error budget, and a concrete case from a mid-sized SaaS company that cut toil from 40% to 15% of engineering time over six months using a simple spreadsheet-based tracking system. Lucas shares the specific criteria for classifying toil, the formula for setting the budget as a percentage of total effort, and the governance process — a weekly toil review board — that prevented scope creep. Luna pushes back on whether toil budgets just push work onto other teams, and Lucas explains the 'clean-up after yourself' rule that prevents that. The episode closes with a practical tip: start by running a three-week time diary before imposing any budget. No marketing fluff. #ToilBudget #SRE #SiteReliabilityEngineering #Automation #GoogleSRE #IncidentResponse #Productivity #EngineeringCulture #DevOps #TechOps #WorkflowAutomation #Observability #FexingoBusiness #BusinessPodcast #Technology #Infrastructure #ToilReduction #SprintPlanning Keep every episode free: buymeacoffee.com/fexingo
NOW PLAYING
How SRE Teams Use toil budgets to prioritize automation
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m