Microsoft Fabric Notebooks for AI Model Training: How to Train on Lakehouse Data Without CSV Chaos episode artwork

EPISODE · Aug 15, 2025 · 16 MIN

Microsoft Fabric Notebooks for AI Model Training: How to Train on Lakehouse Data Without CSV Chaos

from M365.FM - Modern work, security, and productivity with Microsoft 365 · host Mirko Peters - Founder of m365.fm, m365.show and m365con.net

Most teams hit a wall when their “simple” AI experiment outgrows a laptop—fans spin, notebooks freeze, and multi‑terabyte datasets turn every run into an overnight gamble. In this episode, we shift that entire workflow into Microsoft Fabric notebooks, where your model training sits right next to your Lakehouse data, so you can work at full scale without CSV exports, file splits, or memory errors. Starting from a real marketing churn scenario, we walk through what changes when your notebook talks directly to the Lakehouse and Spark does the heavy lifting in the background instead of your local machine.You’ll see why “download and filter locally” is the hidden bottleneck in most AI projects and how direct Lakehouse access in Fabric kills the CSV chaos for good. We break down how queries run where the data lives, how Spark aggregates and joins massive tables before anything touches your Python or R session, and why that alone can save days of waiting and reruns. Instead of nursing fragile extracts, you work against a single, live source of truth that stays aligned with the rest of your data platform.From there, we dive into feature engineering and model selection at scale. You’ll learn how Fabric’s notebook environment and built‑in libraries let you shape hundreds of gigabytes—or even terabytes—of customer history into lean, meaningful features without overwhelming your hardware. We talk about handling high‑cardinality fields, sparse data, and time‑based patterns in a way that improves real‑world prediction quality instead of just adding more columns and compute.By the end, training models on “too big for Excel” datasets will feel less like a heroic stunt and more like a repeatable workflow. You’ll walk away with a mental model for when to move workloads into Fabric notebooks, how to structure your Lakehouse for AI training, and which parts of your current pipeline to retire once the data, compute, and notebooks finally live in one place.WHAT YOU LEARNWhy local notebooks and CSV exports break down once your datasets reach hundreds of gigabytes or more.How Microsoft Fabric notebooks connect directly to Lakehouse data so training runs without manual extracts.How running transformations where the data lives (with Spark) cuts processing time and reduces failed runs.Practical patterns for feature engineering at scale without overfitting or wasting compute.When to move from desktop workflows into Fabric and how to structure your data for large‑scale AI training.CORE INSIGHTThe core insight of this episode is that the real unlock for AI model training isn’t a bigger laptop—it’s bringing compute to the data. When your notebooks run inside Microsoft Fabric, directly against Lakehouse storage with Spark doing the heavy lifting, you stop spending energy on file juggling and hardware limits and start investing it in better features, better models, and faster iterations that actually move the needle for your business.WHO THIS IS FORData scientists and analysts who keep hitting hardware and memory limits with local notebooks.Data engineers looking to modernize pipelines for AI workloads without building custom infrastructure from scratch.Analytics and marketing teams who want to train serious models on full‑fidelity customer data instead of sampled extracts.Architects and platform owners evaluating when and how to standardize AI training on Microsoft Fabric.ABOUT THE HOSTMirko Peters is a Microsoft 365 and cloud consultant who works where data platforms, AI, and modern work meet. He helps organizations move from fragile, laptop‑bound analytics to robust architectures on Microsoft 365, Fabric, and Azure, where large‑scale AI workloads run next to governed, trusted data. In M365.FM, Mirko turns deep technical setups—like Fabric notebooks and Lakehouse‑based training—into practical patterns you can start applying in your own environment this week.Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.

Most teams hit a wall when their “simple” AI experiment outgrows a laptop—fans spin, notebooks freeze, and multi‑terabyte datasets turn every run into an overnight gamble. In this episode, we shift that entire workflow into Microsoft Fabric notebooks, where your model training sits right next to your Lakehouse data, so you can work at full scale without CSV exports, file splits, or memory errors. Starting from a real marketing churn scenario, we walk through what changes when your notebook talks directly to the Lakehouse and Spark does the heavy lifting in the background instead of your local machine.You’ll see why “download and filter locally” is the hidden bottleneck in most AI projects and how direct Lakehouse access in Fabric kills the CSV chaos for good. We break down how queries run where the data lives, how Spark aggregates and joins massive tables before anything touches your Python or R session, and why that alone can save days of waiting and reruns. Instead of nursing fragile extracts, you work against a single, live source of truth that stays aligned with the rest of your data platform.From there, we dive into feature engineering and model selection at scale. You’ll learn how Fabric’s notebook environment and built‑in libraries let you shape hundreds of gigabytes—or even terabytes—of customer history into lean, meaningful features without overwhelming your hardware. We talk about handling high‑cardinality fields, sparse data, and time‑based patterns in a way that improves real‑world prediction quality instead of just adding more columns and compute.By the end, training models on “too big for Excel” datasets will feel less like a heroic stunt and more like a repeatable workflow. You’ll walk away with a mental model for when to move workloads into Fabric notebooks, how to structure your Lakehouse for AI training, and which parts of your current pipeline to retire once the data, compute, and notebooks finally live in one place.WHAT YOU LEARNWhy local notebooks and CSV exports break down once your datasets reach hundreds of gigabytes or more.How Microsoft Fabric notebooks connect directly to Lakehouse data so training runs without manual extracts.How running transformations where the data lives (with Spark) cuts processing time and reduces failed runs.Practical patterns for feature engineering at scale without overfitting or wasting compute.When to move from desktop workflows into Fabric and how to structure your data for large‑scale AI training.CORE INSIGHTThe core insight of this episode is that the real unlock for AI model training isn’t a bigger laptop—it’s bringing compute to the data. When your notebooks run inside Microsoft Fabric, directly against Lakehouse storage with Spark doing the heavy lifting, you stop spending energy on file juggling and hardware limits and start investing it in better features, better models, and faster iterations that actually move the needle for your business.WHO THIS IS...

NOW PLAYING

Microsoft Fabric Notebooks for AI Model Training: How to Train on Lakehouse Data Without CSV Chaos

0:00 16:00

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of M365.FM - Modern work, security, and productivity with Microsoft 365?

This episode is 16 minutes long.

When was this M365.FM - Modern work, security, and productivity with Microsoft 365 episode published?

This episode was published on August 15, 2025.

What is this episode about?

Most teams hit a wall when their “simple” AI experiment outgrows a laptop—fans spin, notebooks freeze, and multi‑terabyte datasets turn every run into an overnight gamble. In this episode, we shift that entire workflow into Microsoft Fabric...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this M365.FM - Modern work, security, and productivity with Microsoft 365 episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!