What the Agentic AI is happening to SRE? episode artwork

EPISODE · Jun 12, 2026 · 23 MIN

What the Agentic AI is happening to SRE?

from Reliability Enablers · host Ash Patel

What if agentic AI makes SRE more important, not less? Bennett Gould explains why autonomous AI systems may create more demand for reliability thinking — not less.Everyone seems to think AI is coming for SRE in a hard way.You might have heard the same story:“AI will write the code.”“Agents will handle incidents.”“Copilots will generate the runbooks.”“Automation will reduce operational load.”Yes, the job question is real. If AI can write code, summarize incidents, query observability tools, generate runbooks, and operate across systems, then engineers are right to ask what happens to the work.But here’s the part that gets missed: AI does not just automate reliability work. It creates more objects and surface areas that need to be made reliable.Agentic AI is moving from demos into real workflows. These systems are no longer just answering questions. They are querying tools, pulling context, generating changes, and in some cases taking action around production environments.That makes this a Monday morning problem.Teams are already using LLMs for incidents, documentation, observability, infrastructure, and operational decision-making. Somewhere, a team is one demo away from giving an agent access to tools originally designed for humans.That is exactly why I wanted to have this conversation.Bennett Gould is currently a solution engineer at Neubird.ai. His career in SRE and SRE-adjacent work spans large enterprises, cloud, industrial technology, and startups, including AWS, IBM, Siemens, and a YC startup.I wanted to ask him a simple question: What in the agentic AI is happening to SRE?Here are 3 highlights from our talk:1. Agentic AI increases the reliability surface areaThe obvious fear is that AI reduces the need for reliability engineers. Bennett’s view was more nuanced. He was clear that engineers still need to adapt. If people do not reskill, stay current, and learn how these systems are forming, there may absolutely be pressure in the job market. But he also argued that AI could create more demand for reliability skills because production complexity is increasing.More code is going into production.More AI-generated code is going into production.More systems that people do not fully understand are going into production.And now autonomous agents are starting to enter production workflows too.That means more surface area. More automation. More operational uncertainty. More ways for things to go wrong.Bennett compared this to Terraform: Infrastructure as code created enormous efficiency gains. But it also created new ways to make very big mistakes very quickly.Before Terraform, most people could not delete all their production resources with a single command. After Terraform, that became technically possible if the system was designed badly enough.Agentic AI follows a similar pattern. With great automation comes great responsibility.Agents can help engineers move faster, query tools, summarize context, and reduce toil. But they can also amplify weak engineering practices, poor boundaries, bad assumptions, and unclear operational ownership. That is not the end of reliability work. That is reliability work entering a new phase.2. Agents can reduce toil, but context is the ceilingOne of the strongest parts of the conversation was Bennett’s explanation of where agents can help in incident response. A lot of SRE work involves moving across tools.You may need to query Prometheus, Dynatrace, logs, traces, cloud consoles, ticketing systems, documentation, runbooks, dashboards, and architecture diagrams.The problem is not always that the engineer lacks judgment.Sometimes the problem is that the information is scattered across too many tools, each with its own query language and interface. Bennett gave a simple example: an engineer might be very good at PromQL and very fast when Prometheus is the source of truth. But if the same engineer has to work in a different observability platform with a different query language, their response time can suffer. That is an obvious place where agents can help.The engineer may not need to know every query language perfectly. They need to know what they are looking for and how to reason about the system. The agent can help translate that intent into the right tool calls, queries, and summaries.That could reduce MTTR. It could reduce toil. It could help engineers move faster during incidents.But Bennett also made the limitation clear: You are only as good as the context you have. This is where he introduced two useful concepts:* Context mining* Context distillationContext mining means proactively finding the information that might be useful in a given operational situation.Context distillation means taking large amounts of information — runbooks, Confluence pages, diagrams, documentation, prior incidents — and reducing it into the minimum useful context an LLM or agent can use.That sounds powerful. But there is a catch. Sometimes the context simply is not there.Many of the largest and most complex organizations still run legacy systems where knowledge lives in people’s heads, stale documentation, tribal memory, and unwritten assumptions.There may not be a clean process for turning that into usable context. That matters because agents do not magically understand your system. They work with the context they are given. If the context is missing, outdated, or wrong, the agent’s usefulness maxes out early.3. Agentic systems are not just LLM demosA basic LLM workflow is relatively easy to demo:You give it a prompt.You connect a few tools.You add some APIs.You get a useful answer.That is impressive, but it is not the same thing as running an agentic system in a meaningful production environment.Bennett made a useful analogy here: running your own infrastructure versus using a hyperscaler.Cloud providers removed a lot of undifferentiated heavy lifting. Most companies do not want to spend half their time racking servers, managing data centers, and dealing with low-level infrastructure when they are trying to serve customers.Agentic systems create similar questions:* What parts of the work should be handled by the system?* What parts still need engineering discipline?* And what has to exist around the model before it is safe and useful?That surrounding structure is where the real work begins. Bennett called this harness engineering. Once you move beyond an LLM demo, you have to think about memory, learning, tool usage, identity, federation, security, evaluations, and guardrails.That is a very different problem from “the model gave a good answer on my laptop.” SREs know why that distinction matters. “It works on my machine” is not an acceptable reliability strategy.A runbook that recovers a thousand-node database cannot be non-deterministic, undocumented, and dependent on someone’s local setup. If it is part of the operational backbone, it needs to be reliable.Agentic AI does not remove that requirement. It makes it more important.Bonus: Agents expose weak engineering practicesAgentic AI not only introduces new problems but it also reveals old ones.* Weak APIs.* Brittle runbooks.* Missing context.* Poor evals.* Unclear tool boundaries.* Operational shortcuts.Systems that were designed assuming careful human use may behave very differently when AI agents start using them. That is why this conversation matters for SRE.Agentic AI is not only a productivity story. It is a reliability story.It forces teams to ask whether their existing practices are strong enough for a world where more actions can be generated, recommended, or executed by autonomous systems.The silver lining for reliability workAgentic AI does not remove the need for reliability thinking. It raises the bar for it. The tools will change. The workflows will change. Some tasks will absolutely be automated or reshaped.But the hardest parts of reliability are still the hard parts:* understanding the system* knowing the trade-offs* building reliable operational processes* making good judgment calls under uncertainty and* owning the outcome when something changes in productionThat is why SRE does not disappear in an agentic AI world.It becomes one of the disciplines that makes the agentic AI world survivable.So if your team is already using AI around incidents, observability, runbooks, infrastructure, or production workflows, the question is not whether the future is coming. The future is already in the workflow.The real question is whether your reliability practices are ready for it. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

NOW PLAYING

What the Agentic AI is happening to SRE?

0:00 23:45

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

RenMac Jeff deGraaf, Neil Dutta, & Stephen Pavlick Stock market commentary from Wall St thought leaders in strategy, economics, technical analysis and policy. Disclaimer ..........This Podcast Audio Show has been prepared by Renaissance Macro Research, LLC (“RenMac”), an affiliate of Renaissance Macro Securities, LLC.This Podcast Audio Show is for distribution only as may be permitted by law. It is published solely for information purposes; it is not an advertisement nor is it a solicitation or an offer to buy or sell any financial instruments or to participate in any particular trading strategy. No representation or warranty, either express or implied, is provided in relation to the accuracy, completeness or reliability of the information contained in this document. The information is not intended to be a complete statement or summary of the markets, economy or other developments referred to in the Podcast Audio Show. Any opinions expressed in this Podcast Audio Show may change without notice. Any statements co 2X eCommerce Podcast Kunle Campbell Hosted by Kunle Campbell, who is an operator of a portfolio of consumer brands, 2X eCommerce is a weekly podcast for ecommerce operators by ecommerce operators. We interview remarkable ecommerce founders and leading enablers of ecommerce growth, with every episode promising to give you at least one growth hypothesis or idea you can test.Our focus is growing your ecommerce revenues from a cross-functional perspective, be it marketing, your tech stack, operations, finance, or customer experience. We believe that by bringing together insights from a variety of experts, we can help you double the throughput of key ecommerce functions.2X your eCommerce growth with insights from Kunle and handpicked experts.Follow us on Twitter, LinkedIn and Instagram: @2XeCommerce Defence & Security Podcast Network Momentum Media The Defence & Security Podcast Network hosts a unique series of podcasts, featuring discussions with key enablers from across the Australian defence and security industry.The podcasts provide the perfect blend of business intelligence and insights from a range of guests, which include government officials, ADF personnel, industry stakeholders, and members of the academic community.By aligning ourselves with the ADF and the Commonwealth government, we are uniquely placed to deliver a dynamic 360° platform that bridges the gap between the customer (Defence) and industry.We split our focus not just into the traditional sectors of Land (Army), Air (Air Force) and Sea (Navy), but into the six new Capability Streams:- Intelligence, Surveillance, Reconnaissance, Electronic Warfare and Cyber- Key Enablers- Air and Sea Lift- Maritime and Anti-Submarine Warfare- Strike and Air Combat- Land Combat and Amphibious WarfareAs Defence moves to ensure the Force Structure Review and the F The Cold-Case Christianity Podcast J. Warner Wallace The Cold-Case Christianity Podcast is hosted by J. Warner Wallace. J. Warner is a Dateline featured cold-case homicide detective, Senior Fellow at the Colson Center for Christian Worldview, adjunct professor of apologetics at Biola University and a faculty member at Summit Ministries. The Cold-Case Christianity podcast explores the evidence for God's existence, the reliability of the Bible and the truth of the Christian worldview. Please visit our website at www.ColdCaseChristianity.com.

Frequently Asked Questions

How long is this episode of Reliability Enablers?

This episode is 23 minutes long.

When was this Reliability Enablers episode published?

This episode was published on June 12, 2026.

What is this episode about?

What if agentic AI makes SRE more important, not less? Bennett Gould explains why autonomous AI systems may create more demand for reliability thinking — not less.Everyone seems to think AI is coming for SRE in a hard way.You might have heard the...

Can I download this Reliability Enablers episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!