EPISODE · Aug 6, 2025 · 21 MIN
Monitoring Data Pipelines in Microsoft Fabric
from M365.FM - Modern work, security, and productivity with Microsoft 365 · host Mirko Peters - Founder of m365.fm, m365.show and m365con.net
Dashboards don’t usually break in one dramatic moment—they quietly drift out of date while everyone assumes the numbers are still right. In this episode, we start from that uncomfortable reality and walk through how most Microsoft Fabric environments have rich telemetry available, but almost no intentional monitoring design to turn it into early warning signals. You’ll hear how pipelines can fail, stall, or degrade for days before anyone notices, and why “the refresh is red” is often the first and only alert business users ever see.We begin with the core problem: Fabric teams tend to wire up a few basic success/failure checks, maybe a status email, and then rely on users to report broken reports. That leads to a reactive culture where data engineers spend mornings firefighting instead of improving reliability. We connect this to four dimensions Fabric already gives you—performance metrics, error logs, lineage, and recovery options—and show why treating them as separate features guarantees blind spots.From there, we walk through what a deliberately designed monitoring system in Fabric actually looks like. You’ll see how to use metrics such as pipeline duration, throughput, queue times, and resource utilization to detect anomalies before SLAs are breached. We talk about turning vague failure messages into actionable error logging, so you can pinpoint which activity, dataset, or external dependency caused the problem instead of digging through generic “something went wrong” alerts.Then we zoom out with data lineage. Instead of just knowing that a pipeline failed, you need to know which dashboards, departments, and decisions are now running on stale or incomplete data. We explore how Fabric’s lineage views help you map impact, prioritize fixes, and communicate clearly with stakeholders, so you stop discovering critical breaks from executive screenshots in your inbox.Finally, we tie it all together with recovery. Monitoring has no value if every alert just leads to someone manually rerunning jobs in the portal. We discuss how to design automated recovery paths—retries with backoff, quarantines for bad data, and fallback datasets—so alerts trigger concrete actions instead of just notification fatigue. By the end, monitoring in Fabric is no longer a scattered set of charts and logs, but a connected safety net that prevents silent failures and lets your team ship faster with confidence.WHAT YOU LEARNWhy most Microsoft Fabric monitoring setups only catch failures after business users are already affected.How to use performance metrics (duration, throughput, queue times, resource usage) as early warning signals for pipeline health.How to turn Fabric error logs into specific, actionable diagnostics instead of generic failure notifications.How to use data lineage to see which reports, teams, and processes are impacted by an upstream issue.How to design automated recovery paths so alerts lead directly to retries, quarantines, or fallbacks instead of manual firefighting.How combining metrics, logs, lineage, and recovery into one monitoring design changes Fabric from reactive troubleshooting to proactive reliability.CORE INSIGHTThe core insight of this episode is that reliable monitoring in Microsoft Fabric doesn’t come from adding more alerts—it comes from designing metrics, logs, lineage, and recovery as one system that works together. When you connect those four pillars, you stop discovering failures through broken dashboards and start preventing them before they hit your users.WHO THIS IS FORData engineers and Fabric admins responsible for keeping pipelines and dashboards healthy.Analytics teams tired of being the last to know when a dataset is stale or a load has failed.Architects designing end‑to‑end data platforms on Microsoft Fabric with clear SLAs.Business stakeholders who depend on daily reports and want fewer surprises and more predictable data quality.ABOUT THE HOSTMirko Peters is a Microsoft 365 and data platform consultant and the host of M365.FM, focused on modern work, security, and analytics architectures in the Microsoft ecosystem. He helps organizations move from fragile, ad‑hoc reporting setups to robust platforms on Microsoft 365 and Fabric, where reliability and monitoring are designed from day one, not bolted on after outages. In M365.FM, Mirko turns messy real‑world reliability problems—like silent pipeline failures in Fabric—into practical monitoring patterns teams can actually implement.Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.
What this episode covers
Dashboards don’t usually break in one dramatic moment—they quietly drift out of date while everyone assumes the numbers are still right. In this episode, we start from that uncomfortable reality and walk through how most Microsoft Fabric environments have rich telemetry available, but almost no intentional monitoring design to turn it into early warning signals. You’ll hear how pipelines can fail, stall, or degrade for days before anyone notices, and why “the refresh is red” is often the first and only alert business users ever see.We begin with the core problem: Fabric teams tend to wire up a few basic success/failure checks, maybe a status email, and then rely on users to report broken reports. That leads to a reactive culture where data engineers spend mornings firefighting instead of improving reliability. We connect this to four dimensions Fabric already gives you—performance metrics, error logs, lineage, and recovery options—and show why treating them as separate features guarantees blind spots.From there, we walk through what a deliberately designed monitoring system in Fabric actually looks like. You’ll see how to use metrics such as pipeline duration, throughput, queue times, and resource utilization to detect anomalies before SLAs are breached. We talk about turning vague failure messages into actionable error logging, so you can pinpoint which activity, dataset, or external dependency caused the problem instead of digging through generic “something went wrong” alerts.Then we zoom out with data lineage. Instead of just knowing that a pipeline failed, you need to know which dashboards, departments, and decisions are now running on stale or incomplete data. We explore how Fabric’s lineage views help you map impact, prioritize fixes, and communicate clearly with stakeholders, so you stop discovering critical breaks from executive screenshots in your inbox.Finally, we tie it all together with recovery. Monitoring has no value if every alert just leads to someone manually rerunning jobs in the portal. We discuss how to design automated recovery paths—retries with backoff, quarantines for bad data, and fallback datasets—so alerts trigger concrete actions instead of just notification fatigue. By the end, monitoring in Fabric is no longer a scattered set of charts and logs, but a connected safety net that prevents silent failures and lets your team ship faster with confidence.WHAT YOU LEARNWhy most Microsoft Fabric monitoring setups only catch failures after business users are already affected.How to use performance metrics (duration, throughput, queue times, resource usage) as early warning signals for pipeline health.How to turn Fabric error logs into specific, actionable diagnostics instead of generic failure notifications.How to use data lineage to see which reports, teams, and processes are impacted by an upstream issue.<a href="https://www.spreaker.com/cms/episodes/67289387/edit/info?filter=NETWORK&network=18613266" target="_blank"...
NOW PLAYING
Monitoring Data Pipelines in Microsoft Fabric
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m