EPISODE · Dec 31, 2025 · 1H 22M
Stop Delegating AI Decisions: How Spec Kit Makes AI Agents Safe in Microsoft Entra and Microsoft 365
from M365.FM - Modern work, security, and productivity with Microsoft 365 · host Mirko Peters - Founder of m365.fm, m365.show and m365con.net
(00:00:00) The AI Governance Dilemma (00:00:38) The Pitfalls of Unchecked AI-Powered Development (00:03:16) The Spec Kit Solution: Binding Intent to Executable Rules (00:05:38) The Mechanics of Privileged Creep (00:17:42) Consent Sprawl: When Convenience Becomes a Threat (00:23:00) Conditional Access Erosion: The Silent Threat (00:28:44) Measuring and Improving Identity Governance (00:34:13) Implementing Constitutional Governance with Spec Kit (00:34:56) The Power of Executable Governance (00:40:11) Identity Policies as Compilers In this episode of m365.fm, Mirko Peters looks at what really happens when teams let AI agents make technical decisions in live Microsoft Entra and Microsoft 365 environments. AI agents are increasingly wired directly into internal APIs, developer workflows, and infrastructure, where they write code, call services, and change configurations at scale. The problem: agents optimize for task completion, not for long‑term safety, governance, or architectural intent. This episode explains why “letting the agent figure it out” quickly becomes a reliability and security risk once you leave the lab and enter production.WHY AI AGENTS BEHAVE DIFFERENTLY IN REAL SYSTEMSIn theory, agentic systems sound efficient: describe the outcome, let the agent plan and execute. In practice, production reality is messy. Agents chain unexpected API calls, pick unsafe defaults, and generate changes that engineers struggle to reproduce or fully understand later. A small prompt can lead to a large system change, touching identity, permissions, and data paths you never intended to expose. Debugging this behavior is significantly harder than debugging human‑written code, especially when logs, prompts, and context windows interact in non‑obvious ways.NON‑DETERMINISM IS AN ENGINEERING PROBLEM, NOT JUST A RESEARCH QUIRKMany teams underestimate how non‑deterministic behavior impacts operations, audits, and incident response. The same agent prompt can produce different code, different API calls, or different side effects across runs. That makes root‑cause analysis, reproducible fixes, and compliance evidence difficult or impossible. This episode argues that determinism still matters deeply in modern systems: you need clear boundaries where behavior is predictable, testable, and reviewable—even if an LLM is involved somewhere in the pipeline.SECURITY, PERMISSIONS, AND ACCIDENTAL CHAOSSecurity risk multiplies when AI agents are treated like “junior engineers” instead of untrusted automation. In practice, agents tend to request broader permissions than necessary, store secrets unsafely, or create undocumented endpoints and shortcuts. They may bypass established workflows, skip approvals, or write code that quietly weakens existing controls. The episode breaks down why traditional security assumptions break once agents can act, and why you must design your systems as if agents are external, untrusted callers—no matter how smart they appear.WHAT SPEC KIT DOES: ENFORCING ARCHITECTURAL INTENTSpec Kit is introduced as a way to make architectural intent explicit and enforceable before agents touch real systems. Instead of letting an agent “decide” how to integrate with Microsoft Graph or internal APIs, Spec Kit defines allowed actions, constraints, patterns, and security expectations up front. Agents then operate inside this contract, not outside it. That shift turns AI from an autonomous decision‑maker into a constrained executor of well‑defined, testable specifications—keeping architecture, security, and compliance in control.BEST PRACTICES FOR BUILDING AI AGENTS SAFELYThe episode offers concrete guidance for teams working with AI agents in Microsoft‑centric and cloud environments: treat agents like untrusted external services, use strict permission scopes and role separation, and log and audit every agent action. Keep humans in the loop for high‑impact or irreversible operations, and never allow agents to directly deploy or modify production systems without controlled pipelines. Tools like GitHub, Microsoft Entra, and modern AI APIs can absolutely accelerate development—but only when paired with clear boundaries, strong review processes, and explicit architecture.WHAT YOU WILL LEARNWhy AI agents behave unpredictably once connected to real infrastructure and internal APIs.How non‑determinism and opaque reasoning make debugging and compliance significantly harder.Why traditional identity, permission, and security models break if agents are treated as trusted teammates.How Spec Kit can encode architectural intent so agents execute within safe, predefined patterns.Practical patterns to limit blast radius, enforce least privilege, and keep humans in the loop. WHO THIS EPISODE IS FORSoftware engineers and platform teams working with LLMs and AI agents.Security engineers, identity teams, and architects responsible for Microsoft Entra and Microsoft 365.CTOs, tech leads, and product owners evaluating agentic systems for real workloads.Anyone building AI‑powered developer tools or automation on top of internal APIs.ABOUT THE HOSTMirko Peters is a Microsoft 365 expert, architect, and host of m365.fm. He works with organizations from small businesses to large enterprises on Microsoft 365 architecture, security, AI integration, governance design, and system architecture. His work focuses on designing context‑driven systems that reduce complexity, enable autonomous execution, and create scalable performance across modern enterprises.Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.
What this episode covers
(00:00:00) The AI Governance Dilemma (00:00:38) The Pitfalls of Unchecked AI-Powered Development (00:03:16) The Spec Kit Solution: Binding Intent to Executable Rules (00:05:38) The Mechanics of Privileged Creep (00:17:42) Consent Sprawl: When Convenience Becomes a Threat (00:23:00) Conditional Access Erosion: The Silent Threat (00:28:44) Measuring and Improving Identity Governance (00:34:13) Implementing Constitutional Governance with Spec Kit (00:34:56) The Power of Executable Governance (00:40:11) Identity Policies as Compilers In this episode of m365.fm, Mirko Peters looks at what really happens when teams let AI agents make technical decisions in live Microsoft Entra and Microsoft 365 environments. AI agents are increasingly wired directly into internal APIs, developer workflows, and infrastructure, where they write code, call services, and change configurations at scale. The problem: agents optimize for task completion, not for long‑term safety, governance, or architectural intent. This episode explains why “letting the agent figure it out” quickly becomes a reliability and security risk once you leave the lab and enter production.WHY AI AGENTS BEHAVE DIFFERENTLY IN REAL SYSTEMSIn theory, agentic systems sound efficient: describe the outcome, let the agent plan and execute. In practice, production reality is messy. Agents chain unexpected API calls, pick unsafe defaults, and generate changes that engineers struggle to reproduce or fully understand later. A small prompt can lead to a large system change, touching identity, permissions, and data paths you never intended to expose. Debugging this behavior is significantly harder than debugging human‑written code, especially when logs, prompts, and context windows interact in non‑obvious ways.NON‑DETERMINISM IS AN ENGINEERING PROBLEM, NOT JUST A RESEARCH QUIRKMany teams underestimate how non‑deterministic behavior impacts operations, audits, and incident response. The same agent prompt can produce different code, different API calls, or different side effects across runs. That makes root‑cause analysis, reproducible fixes, and compliance evidence difficult or impossible. This episode argues that determinism still matters deeply in modern systems: you need clear boundaries where behavior is predictable, testable, and reviewable—even if an LLM is involved somewhere in the pipeline.SECURITY, PERMISSIONS, AND ACCIDENTAL CHAOSSecurity risk multiplies when AI agents are treated like “junior engineers” instead of untrusted automation. In practice, agents tend to request broader permissions than necessary, store secrets unsafely, or create undocumented endpoints and shortcuts. They may bypass established workflows, skip approvals, or write code that quietly weakens existing controls. The episode breaks down why traditional security assumptions break once agents can act, and why you must design your systems as if agents are external, untrusted callers—no matter how smart they appear.WHAT SPEC KIT DOES: ENFORCING ARCHITECTURAL INTENTSpec Kit is introduced as a way to make architectural intent explicit and enforceable before agents touch real systems. Instead of letting an agent “decide” how to integrate with Microsoft Graph or internal...
NOW PLAYING
Stop Delegating AI Decisions: How Spec Kit Makes AI Agents Safe in Microsoft Entra and Microsoft 365
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m