The Data Flowcast: Mastering Apache Airflow ® for Data En...

Q: How many episodes does The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI have?

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI currently has 50 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

Q: How often does The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI release new episodes?

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI has 50 episodes. Check the episode list to see recent publication dates and frequency.

Q: Where can I listen to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI?

You can listen to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Q: Who hosts The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI?

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI is created and hosted by Astronomer.

102

Managing a Customer Analytics Platform with Airflow at Skimlinks

Skimlinks runs a reporting platform that serves around 2,000 weekly publisher users, and the data infrastructure behind it runs on Airflow. In this episode, Julian Larralde, Director of Data Engineering at Skimlinks, walks through the stack, the migration from external task sensors to event-driven Assets, and a YAML-based DAG factory the team built to onboard new publishers without rewriting Python.Key Takeaways00:00 Introduction.00:45 What Skimlinks does and how it operates as an affiliate marketing network aggregator for publishers.02:12 Julian's team and the data platform they own: a reporting portal that serves ~2,000 weekly publisher users.03:07 The stack: real-time ingestion into BigQuery, Airflow as the orchestrator, raw / silver / gold layers, and Apache Druid as the serving database for sub-second BI queries.04:50 Reusing the same data marts for ~100 internal customers across marketing, finance, operations, and account management.06:25 Airflow as the single orchestrator: BigQuery operators for SQL business logic, plus raw file exports for the largest publishers.08:08 Moving from external task sensors to datasets (now Assets) and what the migration actually solved.09:18 Why sensor polling created scheduler load and worker overload, and how event-driven Assets fixed both.10:15 The lineage view in the Airflow UI that came as a bonus after the Assets migration.10:49 The vision for multi-tenant Airflow inside Skimlinks: replacing cron, Rundeck, and team-local Airflow instances with a shared platform.14:31 Building a custom DAG factory with YAML configuration for onboarding new publishers.17:33 Breaking a single Python class into single-responsibility components for the DataPipe project.19:07 Adding a Pydantic layer so misconfigured YAML fails at DAG parse time instead of run time.20:31 Using AI assistance to guide refactoring decisions and generate tests across the new class structure.22:34 What Julian wants from Airflow next: asset watchers paired with data contracts.Resources MentionedSkimlinks - skimlinks.comApache Airflow - airflow.apache.orgAstronomer - astronomer.ioGoogle BigQuery - cloud.google.com/bigqueryApache Druid - druid.apache.orgPydantic - docs.pydantic.devLooker - cloud.google.com/lookerThanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jun 11, 2026

22m

101

Building a custom Tableau provider for Airflow at JLR

JLR is the UK's largest automotive manufacturer, behind brands like Range Rover, Jaguar, Defender, and Discovery. In this episode, Najeeb Sulaiman, Senior Data Engineer at JLR, walks through how Airflow orchestrates data across manufacturing, supply chain, and finance — including a custom Tableau provider his team built (after the community version dropped PAT authentication) and a CI/CD pipeline that validates DAGs before they reach production.Key Takeaways:00:00 Introduction.00:48 What JLR makes: luxury vehicles under the Range Rover, Jaguar, Defender, and Discovery brands.01:42 Najeeb's team in the Data and AI Office, supporting manufacturing, supply chain, finance, and commerce analytics.03:25 Airflow as the central nervous system of the JLR data stack — the orchestrator that connects every source and downstream system.05:01 How JLR uses Tableau, and the two modes for getting data in: live connection and scheduled extract refresh.06:24 Why scheduled Tableau refreshes go stale: they aren't aware of when the data pipeline actually finished.08:09 First attempt at solving it: Python scripts calling the Tableau REST API directly.08:47 Why the script approach didn't scale across teams — code duplication and version drift.10:00 Trying the community Airflow Tableau provider and hitting the PAT authentication roadblock.12:21 Building a custom provider on top of the community one to keep PAT auth.13:30 Treating CI/CD as a deployment gate for Airflow DAGs at JLR's scale.15:23 What the CI/CD pipeline actually catches: top-level code making external calls, import errors, and Airflow 3 compatibility.17:47 How the gate blocks broken DAGs from reaching production.18:30 What Najeeb wants from Airflow next: native integration testing, better OpenTelemetry support, and built-in lineage.Resources Mentioned:JLR - jaguarlandrover.comApache Airflow - airflow.apache.orgAstronomer - astronomer.ioTableau - tableau.comTableau REST API - help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htmAirflow Tableau provider (community) - airflow.apache.org/docs/apache-airflow-providers-tableauThanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jun 4, 2026

21m

100

Orchestrating 2,000 Airflow pipelines at Luiza Labs with Mateus Ferreira

Running Airflow at the scale of a national retailer means more than just scheduling. It means giving non-engineers a path to ship DAGs, and classifying thousands of runs to know which ones need attention. In this episode, Mateus Ferreira, Senior Data Engineer at Luiza Labs (the technology arm of Magazine Luiza, one of Brazil's largest retailers), joins Marc to talk about the patterns his team uses to run 2,000+ Airflow pipelines across more than four petabytes of data.Key Takeaways:00:00 Introduction01:11 Mateus introduces himself and Luiza Labs, the technology arm of Magazine Luiza (Magalu), one of Brazil's largest retailers (founded 1957). 1,000+ physical stores, multi-region operations, and a data team that has to handle the variability that comes with all of it.04:33 Lu Brain, Magalu's AI initiative built around their character Lu, and how AI fits into the data work.06:47 The data reliability engineering channel where AI summarizes Airflow errors with confidence scores and posts a suggested fix in chat.08:30 How Airflow became the heart of orchestration. Coming from Control-M in banking, then GCP, then consolidating on Cloud Composer to centralize roughly 2,000 pipelines.14:23 The YAML wrapper that lets non-engineers ship DAGs. Reads namespace, tables, and Spark options. Handles CDC, JDBC full, and JDBC incremental collection types with checkpoints. All changes go through data reliability engineering.17:20 Why metadata is the most valuable asset in the AI era, and how the wrapper makes data lineage observable across 2,000 pipelines.18:26 The Data Reliability Engineering team. A 10-person group that is the window to the company, handling maintenance, validation, corrections, and optimization for the business unit pipelines.20:09 Operating at four petabytes of data.21:24 Why they built custom Spark operators. Cost drove the move off the DataprocOperator. The custom operator exposes Spark driver and executor sizing as Airflow parameters and generates the Kubernetes manifest.24:36 The monitoring dashboard built on the Airflow metadata DB. A timeline view that shows how many DAGs run each hour, used to spread scheduling across the day.26:37 Classifying DAGs by their last five runs: success, partially correct, intermittent, total failure. A reusable observability pattern.29:57 How to reach Mateus, and a closing thought in Portuguese on appreciating the good old times while you are living them.Resources Mentioned:Apache Airflow (airflow.apache.org)Magalu Cloud / MGCLuiza Labs (luizalabs.com) and Magazine Luiza / MagaluAstro Observe (https://www.astronomer.io/product)Mateus Ferreira on LinkedIn (linkedin.com/in/mateusmferreira)Thanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

May 28, 2026

32m

99

Enhancing DAGs for Data Processing with William Orgertrice III at Cargill

In the data engineering world, the difference between a pipeline that works and one that's truly production-ready often comes down to a handful of deliberate decisions. William Orgertrice III, Data Engineer at Cargill, joins us to share the DAG design and monitoring practices he presented at Airflow Summit 2025 and how his team is rolling out Airflow across 60+ internal teams as part of Cargill's new Minerva data platform.Key Takeaways:00:00 Introduction. 01:45 Cargill is one of the largest privately owned companies in the US, operating across 70 countries and serving 125+ markets.03:45 William's team on the Cargill Data Platform supports 60+ internal teams, providing data products that drive decisions across finance, inventory and operations.05:10 Cargill chose Airflow as a core component of its new Minerva data platform to replace older ETL tooling with a more supportable, observable stack.06:26 Native SLA sensors and dependency management were specific features that made Airflow the right fit for Cargill's batch ingestion pipelines.09:00 Cargill is running Airflow through Astronomer as their managed solution, with some teams already in production.13:22 Every task in a DAG should have a single, documented purpose — one task doing everything makes troubleshooting significantly harder.14:40 A DAG that never enters a failed state but keeps running indefinitely will spend compute budget without alerting anyone.15:25 In shared Airflow environments, embedding contact information and owner tags in DAGs ensures the right team is reached when something breaks upstream.21:00 William flags connection testing as a friction point in pipeline development — verifying a connection string before building the full job would reduce iteration time.Resources Mentioned:Cargill | Websitehttps://www.cargill.com/food-beverageAirflow Community on Slack https://airflow.apache.org/community/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

May 21, 2026

26m

98

Getting Into Data Engineering with Shrividya Hegde, Data and AI Engineer

In this episode, we take a step back from implementation-specific topics to explore what it actually takes to build a career in data engineering — and how AI is reshaping that path.Shrividya Hegde, a data and AI engineer and an Airflow champion in Astronomer’s Champions program, joins us to discuss getting into data engineering, contributing to open source and why good data engineering should make AI output trustworthy rather than confidently wrong.Key Takeaways:00:00 Introduction.04:08 Build fundamentals before chasing trending tools — understanding what a tool does, why it exists and what problem it solves has to come first. 07:19 Data engineering fundamentals mean SQL query performance under joins and aggregations, how data moves between pipelines, DAG failure recovery and idempotency — not just writing queries. 08:10 The most common mistake newer data engineers make is skipping fundamentals to chase trends — it is a sequencing problem, not a talent problem. 13:15 AI creates more opportunity for data engineers because AI output quality is directly determined by the quality of the data pipeline feeding it — confidently wrong output is harder to catch than obviously wrong output. 15:06 Airflow's supporting operators make AI outputs production-ready — orchestration is what converts experimental AI into something reliable. 17:14 AI-generated DAGs help newer engineers understand underlying concepts rather than just producing working code. 23:12 The Airflow open source community is more welcoming than most people expect for a project of its size — raising issues and reviewing PRs are viable entry points for first contributions.Resources Mentioned:Shrividya Hegdehttps://www.linkedin.com/in/shrividya-hegde-shri-91562365/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.ioWomen in Data | Websitehttps://womenindata.mn.co/landingApache Airflow Slack https://airflow.apache.org/Shrividya's Medium writinghttps://medium.com/@shrihegdeShrividya’ Substack writinghttps://substack.com/@shrividyahegdeThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

May 14, 2026

27m

97

Orchestrating DBT With Cosmos and Airflow with Filip Kunčar at ShipMonk Product Development

We explore how a third-party logistics platform built its entire data orchestration layer on Airflow, and what that makes possible for developer teams and merchant-facing products alike.Filip Kunčar, Platform Director at ShipMonk Product Development, discusses migrating from a closed source tool to Airflow, orchestrating dbt with both Cosmos and the BashOperator and using Airflow to power customer-facing data delivery.Key Takeaways:00:00 Introduction.01:07 ShipMonk is a third-party logistics company guaranteeing two-day delivery across the US. The data platform team's mission is to lower cognitive load for developers working with data. 05:13 ShipMonk migrated to Airflow in 2022, moving away from a closed-source UI-based tool, driven by the need for a code-first approach, open source extensibility and broad cloud provider support. 10:02 The team uses Cosmos for developer-facing visibility and lineage and BashOperator for internal pipelines where runtime performance matters. 12:20 Switching from Cosmos to the BashOperator for a frequently running pipeline reduced runtime from over 15 minutes to three minutes. 13:14 Because the full dbt chain runs inside Airflow, a configurable downstream DAG can deliver processed data directly to each merchant's preferred destination, with secrets management and SLA tracking already handled. 15:03 Per-team alerting is hooked to each DAG by owner and severity, so teams can react to SLA breaches immediately. 18:09 ShipMonk uses Airflow in three ways for AI: authoring DAGs faster with skills, orchestrating AI workloads in Lambda and containers and using Astronomer's skills repo to simplify Airflow version upgrades.Resources Mentioned:Filip Kunčarhttps://www.linkedin.com/in/filipkuncar/ShipMonk Product Developmenthttps://www.linkedin.com/company/shipmonk-product-development/ShipMonk | Websitehttp://www.shipmonk.comAstronomer Cosmoshttp://www.astronomer.io/cosmosAstronomer AI Skills Repohttp://www.github.com/astronomer/airflow-llm-providers-demoDatadoghttp://www.datadoghq.comThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

May 7, 2026

24m

96

Building Airflow CTL with Buğra Öztürk at Mollie

Buğra Öztürk, Senior Data Engineer at Mollie and Committer and PMC member on the Apache Airflow project, joins us to walk through Airflow CTL — what it is, how it differs from the existing Airflow CLI and where it is headed under AIP-94.Key Takeaways:00:00 Introduction.03:10 Buğra has contributed to Airflow since 2022, from docs changes up to Committer and PMC member — a path he hopes inspires others to start small and contribute. 04:05 Airflow CTL solves secure user interaction by abstracting database credentials behind the public core API. 05:13 Airflow CLI and Airflow CTL are complementary — CLI handles administration and database management while CTL handles secure user interactions via the API. 07:08 Airflow CTL authenticates via the API, acquires a JWT token and stores it securely in the OS keyring — running on the user's machine and never requiring direct database access.08:21 Concrete use cases include local DAG development without the UI and CI/CD automation using headless mode with short-lived JWT tokens.10:08 AIP-94 describes the long-term vision — decoupling all remote commands from the Airflow CLI and routing them through Airflow CTL. 13:12 Airflow CTL is currently at 0.X and already being used in CI and deployment automations. The move to 1.0 with full CLI parity is the next milestone under AIP-94.   16:09 Multi-team deployment becoming generally available in a future Airflow release is Buğra's most-anticipated upcoming feature beyond Airflow CTL.Resources Mentioned:Buğra Öztürkhttps://www.linkedin.com/in/bugraozturk93/Molliehttps://www.linkedin.com/company/mollie/Mollie | Websitehttps://www.mollie.com/Apache Airflow CTL https://airflow.apache.org/AIP-94 on Airflow Confluencehttps://lists.apache.org/thread/d2o1pr78wxdp1wozq519stp0pkcv6k6cApache Airflow GitHubhttps://www.github.com/apache/airflowThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Apr 30, 2026

19m

95

Introducing Airflow’s Common AI Provider with Pavan Kumar Gopidesu and Kaxil Naik

In this episode, we explore the newly released Apache Airflow common AI provider — what problem it solves, how it was built and what's coming next.Kaxil Naik, Senior Director of Engineering at Astronomer and Apache Airflow PMC member, and Pavan Kumar Gopidesu, Lead Data Engineer at Experian and Apache Airflow PMC member, join us to walk through the provider's first release and the technical decisions behind it.Key Takeaways:00:00 Introduction.04:05 The common AI provider was born from a real production problem.07:10 Airflow already had the primitives needed for durable agent execution, making it the natural foundation for AI orchestration. 09:15 The LLM schema compare operator uses Apache DataFusion to fetch source schemas.11:07 Apache DataFusion was chosen for its speed.13:09 Hook tool sets expose Airflow's provider hooks to agents with an allowed methods list that blocks destructive operations.15:20 Passing durable=True to an LLM operator caches tool calls and LLM outputs mid-task. 18:13 The provider offers three abstraction levels. 21:20 The provider currently requires Airflow 3 — the team is open to adding Airflow 2.11 support if demand is high enough. 24:10 MCP server configs can be stored as Airflow connections.Resources Mentioned:Kaxil Naikhttps://www.linkedin.com/in/kaxil/Pavan Kumar Gopidesuhttps://www.linkedin.com/in/pavan-kumar-gopidesu/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.ioExperianhttps://www.linkedin.com/company/experian/Apache Airflowhttps://www.linkedin.com/company/apache-airflowApache Airflow common AI provider docshttps://airflow.apache.org/docs/apache-airflow-providers-common-ai/stable/commits.htmlApache DataFusionhttps://datafusion.apache.org/Pydantic AIhttps://pydantic.dev/docs/ai/overview/Airflow Slackhttps://airflow.apache.org/docs/apache-airflow-providers-slack/stable/index.htmlIntroducing the Common AI Provider: LLM and AI Agent Support for Apache Airflowhttps://airflow.apache.org/blog/common-ai-provider/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#Automation #Airflow #MachineLearning

Apr 23, 2026

28m

94

Building AI Debugging Agents Into Airflow DAGs at Jeppesen ForeFlight with Samantha Blaney Cuevas

Aviation data pipelines run on strict 28-day publication cycles, and the margin for error is zero. In this episode, we're joined by Samantha Blaney Cuevas, Software Engineer at Jeppesen ForeFlight, to explore how her team orchestrates a complex, time-sensitive data pipeline with Airflow and where AI is starting to fit into that picture.Key Takeaways:00:00 Introduction.04:05 Airflow orchestrates almost all business logic and data transformations across the cycle, with custom timetables built to track busy and slow periods programmatically.06:10 Cycle-aware sensing tasks handle irregular source deliveries, including duplicates and early or late arrivals, without disrupting the pipeline.08:07 The two main AI use cases are pipeline debugging and cycle awareness — both designed to reduce the manual overhead of monitoring a complex DAG dependency graph.09:03 The Data Port agent is a two-task DAG that routes Slack pipeline alerts to either a predefined command list or an AI token, depending on whether the fix is already known.13:10 AI is still in development at Jeppesen ForeFlight — the team is focused on token efficiency and scoping how much autonomy to give agents across different environments.15:04 Airflow setup and MCP configuration were straightforward — the harder design work was deciding which environments agents could access across QA staging and production.17:06 Airflow's skills repo and agent tooling are helping onboard new developers and extend pipeline awareness to analysts who work alongside engineers on the cycle.19:10 Samantha would like to see single-task retries with different parameters in Airflow — resetting one task without clearing the full pipeline run.21:05 A future AI use case under consideration is live DAG editing and re-upload within Airflow to make one-off fixes without halting pipeline progress.Resources Mentioned:Samantha Blaney Cuevashttps://www.linkedin.com/in/samantha-blaney/Jeppesen ForeFlight | LinkedInhttps://www.linkedin.com/company/jeppesen-foreflight/Jeppesen ForeFlight | Websitehttp://www.foreflight.comAstronomer Airflow Skills Repohttp://www.github.com/astronomer/airflow-llm-providers-demoApache Airflow https://airflow.apache.org/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Apr 16, 2026

22m

93

Introducing Airflow 3.2

We introduce Airflow 3.2 and its updates for teams that build and operate data pipelines.Astronomer’s Head of Customer Education, Marc Lamberti, and Senior Manager of Developer Relations, Kenten Danas, break down what’s new, from asset partitioning to Async Python tasks and DAG versioning. They explore how these updates improve scheduling, performance and observability in production workflows.Key Takeaways:00:00 Introduction.02:10 Airflow 3 architecture separates workers from the metadata database.03:05 Plugin versioning and UI-based backfills simplify operations.06:20 Asset partitioning enables granular, partition-level scheduling.07:15 Triggering DAGs on partitions instead of full datasets.11:05 Deferrable operators reduce worker slot usage.12:00 Async operators reduce database pressure and overhead.14:10 Async improves throughput, not single task speed.22:20 Inlets and outlets improve asset lineage visibility.23:00 DAG version markers show changes directly in the UI.Resources Mentioned:Marc Lambertihttps://www.linkedin.com/in/marclamberti/Apache Airflow https://airflow.apache.org/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.io/3.2 Webinarhttps://www.astronomer.io/events/webinars/introducing-airflow-3-2-videoAsset Partitioning Guidehttps://www.astronomer.io/docs/learn/airflow-partitioned-runsAsynchronous Processes Guidehttps://www.astronomer.io/docs/learn/deferrable-operatorsRelease Noteshttps://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#airflow-3-2-0-2026-04-07Provider Registryhttps://airflow.apache.org/registry/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Apr 9, 2026

26m

92

Reflections on a Decade of Data Engineering at Seattle Data Guy

Lessons from the past decade of data engineering reveal how much the ecosystem has changed and what has stayed surprisingly consistent.In this episode, Benjamin Rogojan, Owner and Data Consultant at Seattle Data Guy, joins us to reflect on how the data engineering landscape has evolved alongside Apache Airflow. We explore when Airflow makes sense as an orchestrator, why batch processing is still dominant and how AI is reshaping the workflows and responsibilities of modern data engineers.Key Takeaways:00:00 Introduction.03:00 Airflow becomes valuable when workflows involve many pipelines, teams and dependencies.05:00 Data engineers are still focused on making data accessible and aligning work with business needs.05:30 Batch pipelines remain the most common approach even as real-time use cases grow.07:45 Many “real-time” requests are actually event-driven batch workflows.09:00 Airflow replaced many custom-built pipeline systems with built-in dependency management.11:00 Modern orchestration tools often build on Airflow concepts or differentiate from them.14:00 AI can assist with writing SQL and pipelines but still requires experienced engineers.15:30 Organizations are collecting increasingly granular data creating more engineering demand.19:00 The data stack has shifted rapidly from Hadoop-era systems to modern cloud platforms.Resources Mentioned:Benjamin Rogojanhttps://www.linkedin.com/in/benjaminrogojan/Seattle Data Guyhttps://www.linkedin.com/company/seattle-data-guy/Apache Airflowhttps://airflow.apache.orgAirflow Summit / Airflow Conferencehttps://airflowsummit.orgSnowflakehttps://www.snowflake.comHubSpot Data Sharing / APIshttps://developers.hubspot.comMLflowhttps://mlflow.orgThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Apr 3, 2026

26m

91

Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam

Data quality is not optional when you manage credit data at scale.In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform.Key Takeaways:00:00 Introduction.01:00 Overview of Credit Karma’s products and financial data ecosystem.02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.03:00 Why PII handling and controlled downstream access led to adopting Airflow.04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.05:00 Why data quality and governance are critical in financial systems.07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.09:00 Introduction to DAG Factory and YAML-based DAG generation.10:00 GitHub executor creates PR-driven DAG workflows with CI checks.12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.13:00 Failed checks stop ETL pipelines and trigger notifications.17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.Resources Mentioned:Ashir Alamhttps://www.linkedin.com/in/ashir-alam/Credit Karmahttps://www.linkedin.com/company/intuit-credit-karma/Apache Airflowhttps://airflow.apache.org/DAG Factoryhttps://github.com/astronomer/dag-factoryBigQuery (Google Cloud)https://cloud.google.com/bigqueryGitHubhttps://github.com/Slackhttps://slack.com/PagerDutyhttps://www.pagerduty.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 26, 2026

22m

90

Open Source Airflow Contributions and Performance Improvements at G-Research with Christos Bisias

Modern Airflow isn’t just orchestration. It's a contribution.In this episode, we explore how open source investment drives real performance gains and deeper observability.We’re joined by Christos Bisias, Open Source Software Engineer, Apache Airflow at G-Research, to discuss how his team uses Airflow for large-scale data transformations, contributes upstream and improves scheduler throughput and OpenTelemetry support. From trace-level observability to CI-enforced metrics governance and a major scheduler optimization, this conversation spans strategy, engineering and community impact.Key Takeaways:00:00 Introduction.01:20 How G-Research applies machine learning and big data to predict financial market movements.02:15 Contributing to open source is a business decision.03:10 Maintaining a fork is costly.04:30 OpenTelemetry collects metrics, logs and traces to provide deep system visibility. 06:10 Custom spans help identify bottlenecks inside tasks and enable performance optimization. 08:05 OpenTelemetry integration works properly in Airflow 3.0 and above.10:00 A YAML-based metrics registry with CI enforcement ensures consistency between docs and exported metrics.12:10 Scheduler throughput improved significantly by applying concurrency limits earlier in the database query.  15:20 Future Task SDK changes may enable language-agnostic DAG authoring beyond Python.Resources Mentioned:Christos Bisiashttps://www.linkedin.com/in/xbis/G-Research https://www.linkedin.com/company/g-research/Apache Airflowhttps://airflow.apache.org/OpenTelemetryhttps://opentelemetry.io/Prometheushttps://prometheus.io/Grafanahttps://grafana.com/Jaegerhttps://www.jaegertracing.io/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 19, 2026

17m

89

Automating Threat Intelligence Using Airflow with Karan Alang

In this episode, Karan Alang, Principal Software Engineer at Versa Networks, joins the conversation to discuss how Airflow can be used to automate threat intelligence in modern cybersecurity environments. He explains the growing scale of cloud computing, the profitability of hacking and the shortage of SOC analysts. Karan also outlines a novel architecture that combines Airflow, XDR, graph databases and LLMs to orchestrate automated threat detection and response.Key Takeaways:00:00 Introduction.05:00 Organizations face massive log volumes and a shortage of SOC analysts.07:00 The solution integrates Airflow, XDR, Neo4j graph databases and LLMs into one architecture.08:00 MITRE ATT&CK provides a global framework for mapping tactics and techniques.11:00 Airflow acts as the orchestration backbone for ingestion graph transformation and LLM workflows.13:00 Graph databases provide a full relationship view of attackers’ systems and entities.14:00 LLMs automate mapping activity to MITRE ATT&CK and assign explainable risk scores.17:00 Traditional signature-based detection allows lateral movement and exfiltration before teams can react.18:00 End-to-end automation is essential to mitigating modern cybersecurity threats.20:00 Future opportunities include deeper LLM integration as first-class citizens within Airflow.Resources Mentioned:Karan Alanghttps://www.linkedin.com/in/karan-alang-4173437Versa Networks | LinkedInhttps://www.linkedin.com/company/versa-networksVersa Networks | Websitehttps://versa-networks.comGoogle Cloud Composer (Managed Airflow on GCP)https://cloud.google.com/composerMicrosoft Defender XDR https://www.microsoft.com/es-es/security/business/siem-and-xdr/microsoft-defender-xdrNeo4j (Graph Database)https://neo4j.comMITRE ATT&CK Frameworkhttps://attack.mitre.orgThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Mar 12, 2026

22m

88

Using Plugins To Customize Airflow at Ponder Labs with Egor Tarasenko

In this episode, we explore how teams scale Apache Airflow in complex environments and what it takes to make orchestration work across many stakeholders. We look at real-world challenges around visibility, ownership and predictability as data platforms grow.Egor Tarasenko, Data and AI Engineer at Ponder Labs, joins us to share how Ponder Labs customizes Airflow for education organizations using plugins, event-driven architectures and AI-powered tooling. He explains how his team supports large charter school networks and why structure, consistency and extensibility become critical at scale.Key Takeaways:00:00 Introduction.01:21 Ponder Labs helps education organizations bring data from many systems together so it becomes useful for teachers, school leaders and administrators.03:10 Airflow serves as the backbone for orchestrating ingestion, transformation and reverse ETL across client data platforms.05:43 Everything is triggered from Airflow to maintain dependency, visibility and a single operational picture.09:05 Managing hundreds of DAGs requires a focus on structure, visibility and consistency across teams.09:51 Treating DAGs like APIs helps teams scale without needing deep knowledge of upstream logic.12:00 Custom plugins like schedule insights help predict DAG run times across layered dependencies.15:00 AI-powered Airflow chat enables non-technical stakeholders to understand DAG ownership dependencies and cluster activity.22:06 Migrating plugins to Airflow 3 improves developer experience through cleaner APIs and faster extensibility.Resources Mentioned:Egor Tarasenkohttps://www.linkedin.com/in/egorseno/Apache Airflowhttps://airflow.apache.orgdbthttps://www.getdbt.comAstronomer Astro Platformhttps://www.astronomer.ioEgor Tarasenko on Substack https://egortarasenko.substack.comThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Mar 5, 2026

27m

87

Scaling Airflow at Wix for Analytics and AI with Ethan Shalev

Modern data orchestration at scale demands reliability, speed and thoughtful adoption of new tooling. As organizations grow, keeping pipelines efficient while supporting more teams becomes a critical challenge.In this episode, we’re joined by Ethan Shalev, Data Engineer at Wix, to discuss how Wix operates Airflow at massive scale, migrates to Airflow 3 and uses AI to accelerate development.Key Takeaways:00:00 Introduction.02:13 Wix structures data engineering across multiple product-focused organizations.03:40 Migrating nearly 8,000 DAGs to Airflow 3 requires careful planning.04:31 Migration creates an opportunity to remove long-standing legacy Airflow code.05:32 Internal playbooks and Cursor rules standardize and speed up DAG migrations.07:39 Airflow 3 introduces backfills, DAG versioning and asset-aware scheduling.09:16 Deferrable operators reduce scheduler congestion in large Airflow environments.12:54 AI-generated code still requires review and strong testing practices.14:52 Moving to managed Airflow reduces operational burden on internal platform teams.15:57 Improving multi-tenancy and UI personalization remains a key Airflow need.Resources Mentioned:Ethan Shalevhttps://www.linkedin.com/in/eshalev/Wix | LinkedInhttps://www.linkedin.com/company/wix-com/Wix | Websitehttps://www.wix.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Trinohttps://trino.io/Apache Iceberghttps://iceberg.apache.org/Cursorhttps://cursor.sh/Airflow Summithttps://airflowsummit.org/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 26, 2026

18m

86

Using Airflow To Orchestrate Billions of Events at Addi with Carlos Daniel Puerto Niño

Strong data orchestration is as much about culture and visibility as it is about technology. As data platforms scale, teams need systems that reduce cognitive load while increasing reliability and observability.In this episode, Carlos Daniel Puerto Niño, Senior Analytics Engineer and Data Analyst at Addi, joins us to share how Addi uses Airflow to support batch orchestration, manage organizational complexity and improve monitoring across its data platform.Key Takeaways:00:00 Introduction.01:25 Changes in company strategy increase data platform complexity over time.04:00 Centralized data teams help manage organizational and technical change.06:08 Scalable architectures support growing data volumes and use cases.09:10 Adopting orchestration tools introduces operational and maintenance challenges.14:43 Abstraction layers lower technical barriers for onboarding new team members.15:36 Modularity and visibility improve the reliability of data pipelines.18:14 Integrated monitoring supports faster incident response and resolution.22:19 Limited access to orchestration metadata constrains proactive analysis.Resources Mentioned:Carlos Daniel Puerto Niñohttps://www.linkedin.com/in/carlospuertoni%C3%B1o/Addi | LinkedInhttps://www.linkedin.com/company/addicol/Addi | Websitehttps://www.addi.comApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Databrickshttps://www.databricks.com/dbthttps://www.getdbt.com/Grafanahttps://grafana.com/Slackhttps://slack.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 19, 2026

24m

85

Building Event-Driven Data Pipelines With Airflow 3 at Astrafy with Andrea Bombino

Real-time data expectations are reshaping how modern data teams think about orchestration and dependencies. As event-driven architectures become more common, teams need to rethink how pipelines react to data changes, rather than schedules.In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, joins us to discuss how event-driven scheduling in Airflow is evolving and how Astrafy applies it to deliver faster, more responsive data pipelines.Key Takeaways:00:00 Introduction.02:02 Astrafy’s role in guiding clients across the modern data stack.03:15 Strong DAG dependencies create challenges for time-based scheduling.04:48 Event-driven pipelines respond to increasing real-time data demands.05:30 Airflow 3 introduces native support for event-driven orchestration.06:27 Sensor-based workflows reveal scalability and efficiency limitations.11:32 Event-driven assets improve efficiency and pipeline elegance.14:45 Governance and cross-instance coordination emerge as ongoing challenges.Resources Mentioned:Andrea Bombinohttps://www.linkedin.com/in/andrea-bombino/Astrafy | LinkedInhttps://www.linkedin.com/company/astrafy/Astrafy | Websitehttps://www.astrafy.ioApache Airflowhttps://airflow.apache.org/Google Cloudhttps://cloud.google.com/Google Pub/Subhttps://cloud.google.com/pubsubGoogle BigQueryhttps://cloud.google.com/bigqueryThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 12, 2026

18m

84

Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira

A strong data-driven mindset underpins how fintech teams scale analytics, infrastructure and decision-making across the business.In this episode, Jaime Oliveira, Lead Data Engineer at Uphold, joins us to discuss how Uphold structures its data organization and orchestration strategy. Jaime shares how the team uses Airflow and dbt to support analytics, reporting and data activation while evolving their approach as the stack grows.Key Takeaways:00:00 Introduction.01:23 A data-driven mindset supports product development and business decisions.02:55 Diverse ingestion pipelines enable scalable analytics.04:18 A single orchestration platform simplifies analytics workflows.05:17 Early experience with orchestration tools shapes engineering practices.08:16 Analytics orchestration works best when aligned with transformation workflows.09:25 Infrastructure choices involve tradeoffs in testing, visibility and overhead.16:39 More collaborative workflow tools could improve accessibility and autonomy.Resources Mentioned:Jaime Oliveirahttps://www.linkedin.com/in/jaime-oliveira-b075855a/Uphold | LinkedInhttps://www.linkedin.com/company/upholdinc/Uphold | Websitehttps://uphold.comApache Airflowhttps://airflow.apache.orgdbthttps://www.getdbt.comSnowflakehttps://www.snowflake.comKuberneteshttps://kubernetes.ioAstronomer Cosmoshttps://astronomer.github.io/astronomer-cosmosCosmos e-bookhttps://www.astronomer.io/ebooks/orchestrating-dbt-with-airflow-using-cosmos/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Feb 5, 2026

18m

83

Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi

Building reliable data pipelines at scale requires more than writing code. It depends on thoughtful design, infrastructure trade-offs and an understanding of how orchestration platforms evolve over time.In this episode, Airflow best practices shaped by real-world implementation are examined. Bhavani Ravi, Independent Software Consultant and Apache Airflow Champion, shares lessons on pipeline design, architectural decisions and the evolution of the Airflow ecosystem in modern data environments.Key Takeaways:00:00 Introduction.01:30 Independent consulting supports effective Airflow adoption.02:38 Early challenges shaped modern Airflow practices.03:21 Airflow setup has become significantly simpler.04:30 New features expanded workflow capabilities.06:03 Frequent releases support long-term sustainability.07:34 Community and providers strengthen the ecosystem.10:03 Pipeline design should come before coding.10:55 Decoupling logic requires careful trade-offs.13:30 Plugins extend Airflow into new use cases.Resources Mentioned:Bhavani Ravihttps://www.linkedin.com/in/bhavanicodes/Apache Airflowhttps://airflow.apache.org/Kuberneteshttps://kubernetes.io/Azure Fabrichttps://learn.microsoft.com/en-us/fabric/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jan 29, 2026

17m

82

Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang

Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems.In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions.Key Takeaways:00:00 Introduction.01:17 Large-scale data platforms require low-latency processing capabilities.02:08 Batch workloads can complement streaming pipelines for additional use cases.03:45 An orchestration framework can act as the core coordination layer.06:12 Batch processing enables workloads that streaming alone cannot support.08:50 Ecosystem maturity and observability are key orchestration considerations.10:15 Built-in run history and logs make failures easier to diagnose.14:20 Platform users can monitor workflows without managing orchestration logic.17:08 Identity, secrets and scheduling present ongoing optimization challenges.19:59 Configuration history and change visibility improve operational reliability.Resources Mentioned:Han Zhanghttps://www.linkedin.com/in/zhanghan177Conviva | Websitehttp://www.conviva.comApache Airflowhttps://airflow.apache.org/Celeryhttps://docs.celeryq.dev/Temporalhttps://temporal.io/Kuberneteshttps://kubernetes.io/LDAPhttps://ldap.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jan 22, 2026

21m

81

Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik

Data platforms are moving from batch-first pipelines to near real-time systems where orchestration, observability, scalability and governance all have to work together.In this episode, Arun Karthik, Director, Data Solutions Engineering at Condé Nast Technology Lab, joins us to share how data engineering evolves from relational databases and ETL into distributed processing, modern orchestration with Apache Airflow and managed Airflow with Astronomer.Key Takeaways:00:00 Introduction.02:13 Early data systems rely heavily on relational databases and batch-oriented processing models.07:01 Scheduling requirements evolve beyond fixed time windows as dependencies increase.10:14 Ease of use and developer experience influence adoption of orchestration frameworks.13:22 Operating open source orchestration tools requires ongoing engineering effort.14:45 Managed services help teams reduce infrastructure and maintenance responsibilities.17:27 Observability improves confidence in pipeline execution and system health.19:12 Governance considerations grow in importance as data platforms mature.20:46 Building data systems requires balancing speed, reliability and long-term sustainability.Resources Mentioned:Arun Karthikhttps://www.linkedin.com/in/earunkarthik/Condé Nast Technology Lab | LinkedInhttps://www.linkedin.com/company/conde-nast-technology-lab/Condé Nast Technology Lab | Websitehttps://www.condenast.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Apache Sparkhttps://spark.apache.org/Apache Hadoophttps://hadoop.apache.org/Jenkinshttps://www.jenkins.io/dbt Labshttps://www.getdbt.com/product/what-is-dbtAmazon Web Serviceshttps://aws.amazon.com/free/?trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&ef_id=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE:G:s&s_kwcid=AL!4422!3!785574063524!e!!g!!amazon%20web%20services!23291338728!189486861095&gad_campaignid=23291338728&gbraid=0AAAAADjHtp813XNbg7azDj5QMwJPbGNqZ&gclid=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwEThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Jan 15, 2026

24m

80

The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards.Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML ops, regulatory compliance and large-scale data orchestration. He also shares insights into upgrading to Airflow 3 and the importance of balancing flexibility with security in a healthcare environment.Key Takeaways:00:00 Introduction.04:21 The role of Airflow in managing ETL pipelines and ML retraining.06:23 Using AWS SageMaker for ML training and deployment.07:47 Why Airflow’s versatility makes it ideal for MLOps.10:50 The importance of documentation and best practices for engineering teams.13:44 Automating anonymization of user data for compliance.15:30 The benefits of remote execution in Airflow 3 for regulated industries.18:16 Quality-of-life improvements and desired features in future Airflow versions.Resources Mentioned:Max Calehuffhttps://www.linkedin.com/in/maxwell-calehuff/Vivian Health | LinkedInhttps://www.linkedin.com/company/vivianhealth/Vivian Health | Websitehttps://www.vivian.comApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/AWS SageMakerhttps://www.google.com/aclk?sa=L&ai=DChsSEwj3-fbz1tiQAxWXlKYDHXUBBVoYACICCAEQABoCdGI&ae=2&aspm=1&co=1&ase=2&gclid=Cj0KCQiA5abIBhCaARIsAM3-zFWbfj2olUvX4dqoiYNaE3q2fMf_ZifRjmbKNQCVX7D6ZMClaUXUkFkaAuwmEALw_wcB&cid=CAASQuRoMccxWhBvMq-1Uez3XOZti1ul7mTDotKvSMoDHv0q2xCsyS2FzMptO5dJf3tmfkLRu22TtD8ChTmdjvs6YetTjQ&cce=2&category=acrcp_v1_35&sig=AOD64_2xE2xolEEVbpDb56qXQluxTzs-Aw&q&nis=4&adurl&ved=2ahUKEwj7le3z1tiQAxWXcvUHHfZePbAQ0Qx6BAgUEAEdbtLabshttps://www.getdbt.com/Cosmoshttps://github.com/astronomer/astronomer-cosmosSplithttps://www.split.io/Snowflakehttps://www.snowflake.com/en/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Dec 11, 2025

19m

79

Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight.In this episode, András Gombosi, Senior Engineering Manager of Data Infra and Analytics Engineering, and Paul Vickers, Principal Engineer, both at Intercom, share how they built one of the largest Airflow deployments in production and enabled self-serve data platforms across teams.Key Takeaways:00:00 Introduction.04:24 Community input encourages confident adoption of a common platform.08:50 Self-serve workflows require consistent guardrails and review.09:25 Internal infrastructure support accelerates scalable deployments.13:26 Batch LLM processing benefits from a configuration-driven design.15:20 Standardized development environments enable effective AI-assisted work.19:58 Applied AI enhances internal analysis and operational enablement.27:27 Strong test coverage and staged upgrades protect stability.30:36 Proactive observability and on-call ownership improve outcomes.Resources Mentioned:András Gombosihttps://www.linkedin.com/in/andrasgombosi/Paul Vickershttps://www.linkedin.com/in/paul-vickers-a22b76a3/Intercom | LinkedInhttps://www.linkedin.com/company/intercom/Intercom | Websitehttps://www.intercom.comApache Airflowhttps://airflow.apache.org/dbtLabshttps://www.getdbt.com/Snowflake Cortex AIhttps://www.snowflake.com/en/product/features/cortex/Datadoghttps://www.datadoghq.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Dec 4, 2025

34m

78

How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability.In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry.Key Takeaways:00:00 Introduction.06:19 Custom scripts made sharing and reuse difficult.09:29 Workflows are manually triggered with user traceability.10:38 Customization supports varied compute requirements.12:48 Persistent volumes allow tasks to share large amounts of data.14:25 Custom operators separate logic from infrastructure.16:43 Modified triggers connect dependent workflows.18:36 UI plugins enable file uploads and secure access.Resources Mentioned:Anja MacKenziehttps://www.linkedin.com/in/anja-mackenzie/Covestro | LinkedInhttps://www.linkedin.com/company/covestro/Covestro | Websitehttps://www.covestro.comApache Airflowhttps://airflow.apache.org/Kuberneteshttps://kubernetes.io/Airflow KubernetesPodOperatorhttps://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.htmlAstronomerhttps://www.astronomer.io/Airflow Academy by Marc Lambertihttps://www.udemy.com/user/lockgfg/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_GammaCatchall_NonP_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Gamma&utm_content=deal4584&utm_term=_._ag_169801645584_._ad_700876640602_._kw__._de_c_._dm__._pl__._ti_dsa-1456167871416_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21341313808&gbraid=0AAAAADROdO1_-I2TMcVyU8F3i1jRXJ24K&gclid=Cj0KCQjwvJHIBhCgARIsAEQnWlC1uYHIRm3y9Q8rPNSuVPNivsxogqfczpKHwhmNho2uKZYC-y0taNQaApU2EALw_wcBAirflow Documentationhttps://airflow.apache.org/docs/Airflow Pluginshttps://airflow.apache.org/docs/apache-airflow/1.10.9/plugins.htmlThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Nov 20, 2025

23m

77

Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments.In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team leverages Airflow for ETF calculations, data validation and workflow reliability within tightly controlled release cycles.Key Takeaways:00:00 Introduction.03:24 The orchestrator ensures secure and auditable workflows.05:13 Validations before and after computation prevent errors.08:24 Release freezes shape prioritization and delivery plans.11:14 Migration plans must respect managed service constraints.13:04 Versioning, backfills and event triggers increase reliability.15:08 UI and integration improvements simplify operations.18:05 New contributors should start small and seek help.Resources Mentioned:Valentyn Druzhyninhttps://www.linkedin.com/in/valentyn-druzhynin/AgileEngine | LinkedInhttps://www.linkedin.com/company/agileengine/AgileEngine | Websitehttps://agileengine.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/AWS Managed Airflowhttps://aws.amazon.com/managed-workflows-for-apache-airflow/Google Cloud Composer (Managed Airflow)https://cloud.google.com/composerAirflow Summithttps://airflowsummit.org/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Nov 13, 2025

21m

76

How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar

The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat.In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen governance, reduce compliance risk and improve customer experience.Key Takeaways:00:00 Introduction.01:53 A focused analytics platform reduces compliance risk in life sciences.07:31 A centralized warehouse orchestrated by Airflow strengthens governance.09:12 Managed orchestration keeps attention on analytics and outcomes.10:32 A modern transformation stack enables scalable modeling and operations.11:51 Event-driven pipelines improve data freshness and responsiveness.14:13 Asset-oriented scheduling and versioning enhance reliability and change control.16:53 Observability and SLAs build confidence in data quality and freshness.21:04 Priorities include partitioned assets and streamlined developer tooling.Resources Mentioned:Shankar Mahindarhttps://www.linkedin.com/in/shankar-mahindar-83a61b137/Redica Systems | LinkedInhttps://www.linkedin.com/company/redicasystems/Redica Systems | Websitehttps://redica.comApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Snowflakehttps://www.snowflake.com/AWShttps://aws.amazon.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Nov 6, 2025

23m

75

How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov

The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data.In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entity extraction and fuzzy matching, linking the UK Register of Members’ Financial Interests with Companies House, and how this work cuts weeks of manual analysis to minutes.Key Takeaways:00:00 Introduction.02:12 What computational journalism means for day-to-day newsroom work.05:22 Why a shared orchestration platform supports consistent, scalable workflows.08:30 Tradeoffs of one centralized platform versus many separate instances.11:52 Using pipelines to structure messy sources for faster analysis.14:14 Turning recurring disclosures into usable data for investigations.16:03 Applying lightweight ML and matching to reveal entities and links.18:46 How automation reduces manual effort and shortens time to insight.20:41 Practical improvements that make backfilling and reliability easier.Resources Mentioned:Zdravko Hvarlingovhttps://www.linkedin.com/in/zdravko-hvarlingov-3aa36016b/Financial Times | LinkedInhttps://www.linkedin.com/company/financial-times/Financial Times | Websitehttps://www.ft.com/Apache Airflowhttps://airflow.apache.org/UK Register of Members’ Financial Interestshttps://www.parliament.uk/mps-lords-and-offices/standards-and-financial-interests/parliamentary-commissioner-for-standards/registers-of-interests/register-of-members-financial-interests/UK Companies Househttps://www.gov.uk/government/organisations/companies-houseDopplerhttps://www.doppler.com/Kuberneteshttps://kubernetes.io/Airflow Kubernetes Executorhttps://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.htmlGitHubhttps://github.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Oct 30, 2025

24m

74

Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines.In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.Key Takeaways:00:00 Introduction.05:28 Challenges of decentralization.06:45 YAML-based generator standardizes pipelines and dependencies.12:28 Declarative assets and sensors align cross-DAG dependencies.17:29 Task-level callbacks enable auto-recovery and clear ownership.21:39 Standardized building blocks simplify upgrades and maintenance.24:52 Platform focus frees domain work.26:49 Container-only standardization prevents sprawl.Resources Mentioned:Oscar Ligtharthttps://www.linkedin.com/in/oscar-ligthart/Rodrigo Loredohttps://www.linkedin.com/in/rodrigo-loredo-410a16134/Vinted | LinkedInhttps://www.linkedin.com/company/vinted/Vinted | Websitehttps://www.vinted.com/?srsltid=AfmBOor87MGR_eLOauCO93V9A-aLDaAhGYx9cnu_oN8s1SAXMlCRuhW7Apache Airflowhttps://airflow.apache.org/Kuberneteshttps://kubernetes.io/dbthttps://www.getdbt.com/Google Cloud Vertex AIhttps://cloud.google.com/vertex-aiAirflow Datasets & Assets (concepts)https://www.astronomer.io/docs/learn/airflow-datasetsAirflow Summithttps://airflowsummit.org/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Oct 23, 2025

29m

73

Transforming Data Pipelines at XENA Intelligence with Naseem Shah

The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities.In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed Amazon review analysis with LLMs.Key Takeaways:00:00 Introduction.03:28 The importance of building initial products that support growth and investment.06:16 The process of adopting new tools to improve reliability and efficiency.09:29 Approaches to learning complex technologies through practice and fundamentals.13:57 Trade-offs small teams face when balancing performance and costs.18:40 Using AI-driven approaches to generate insights from large datasets.22:38 How unstructured data can be transformed into actionable information.25:55 Moving from manual tasks to fully automated workflows.28:05 Orchestration as a foundation for scaling advanced use cases.Resources Mentioned:Naseem Shahhttps://www.linkedin.com/in/naseemshah/Xena Intelligence | LinkedInhttps://www.linkedin.com/company/xena-intelligence/Xena Intelligence | Websitehttps://xenaintelligence.com/Apache Airflowhttps://airflow.apache.org/Google Cloud Composerhttps://cloud.google.com/composerTechstarshttps://www.techstars.com/Dockerhttps://www.docker.com/AWS SQShttps://aws.amazon.com/sqs/PostgreSQLhttps://www.postgresql.org/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Oct 16, 2025

28m

72

Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith

Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed.In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Solutions Architect at Wherobots, join us to discuss leveraging Apache Airflow and Apache Sedona to process massive geospatial datasets, build reproducible pipelines and orchestrate complex workflows across platforms.Key Takeaways:00:00 Introduction.03:22 How merging multiple data sources supports comprehensive datasets.04:20 The value of flexible configurations for running pipelines on different platforms.06:35 Why orchestration tools are essential for handling continuous data streams.09:45 The importance of observability for monitoring progress and troubleshooting issues.11:30 Strategies for processing large, complex datasets efficiently.13:27 Expanding orchestration beyond core pipelines to automate frequent tasks.17:02 Advantages of using open-source operators to simplify integration and deployment.20:32 Desired improvements in orchestration tools for usability and workflow management.Resources Mentioned:Alex Iannicellihttps://www.linkedin.com/in/atiannicelli/Overture Maps Foundation | LinkedInhttps://www.linkedin.com/company/overture-maps-foundation/Overture Maps Foundation | Websitehttps://overturemaps.orgDaniel Smithhttps://www.linkedin.com/in/daniel-smith-analyst/Wherobots | LinkedInhttps://www.linkedin.com/company/wherobotsWherobots | Websitehttps://www.wherobots.comApache Airflowhttps://airflow.apache.org/Apache Sedonahttps://sedona.apache.org/Github repohttps://github.com/wherobots/airflow-providers-wherobotsThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Oct 9, 2025

24m

71

Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya

PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible.In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team manages Airflow at scale while ensuring security, performance and cost efficiency.Key Takeaways:00:00 Introduction.02:31 Enabling developer delight by extending platform capabilities.03:56 Role of Snowflake, dbt and Airflow in PepsiCo’s data stack.06:10 Local developer environments built using official Airflow Helm charts.07:13 Pre-staging and PR environments as testing playgrounds.08:08 Automating labeling and resource allocation via DAG factories.12:16 Cost optimization through pod labeling and Datadog insights.14:01 Isolating dbt engines to improve performance across teams.16:12 Wishlist for Airflow 3: Improved role-based grants and database modeling.Resources Mentioned:Kunal Bhattacharyahttps://www.linkedin.com/in/kunaljubce/PepsiCo | LinkedInhttps://www.linkedin.com/company/pepsico/PepsiCo | Websitehttps://www.pepsico.comApache Airflowhttps://airflow.apache.org/Snowflakehttps://www.snowflake.comdbthttps://www.getdbt.comKuberneteshttps://kubernetes.ioGreat Expectationshttps://greatexpectations.ioMonte Carlohttps://www.montecarlodata.comThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Oct 2, 2025

19m

70

Building a Unified Data Platform at Pattern with William Graham

The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines.In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open-source tool Heimdall to streamline scheduling, orchestration and access management.Key Takeaways:00:00 Introduction.02:44 Structure of Pattern’s data teams across acquisition, engineering and platform.04:27 How Airflow became the central scheduler for batch jobs.08:57 Credential management challenges that led to decoupling scheduling and orchestration.12:21 Heimdall simplifies multi-application access through a unified interface.13:15 Standardized operators in Airflow using Heimdall integration.17:13 Open-source contributions and early adoption of Heimdall within Pattern.21:01 Community support for Airflow and satisfaction with scheduling flexibility.Resources Mentioned:William Grahamhttps://www.linkedin.com/in/willgraham2/Pattern | LinkedInhttps://www.linkedin.com/company/pattern-hq/Pattern | Websitehttps://pattern.comApache Airflowhttps://airflow.apache.orgHeimdall on GitHubhttps://github.com/patterninc/heimdallNetflix Geniehttps://netflix.github.io/genie/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Sep 25, 2025

24m

69

How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty

The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations.In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey into data engineering and the lessons learned from leading Astronomer’s Customer Reliability Engineering (CRE) team.Key Takeaways:00:00 Introduction.03:07 Lessons learned in adapting to major platform transitions.05:18 How proactive monitoring improves reliability and customer experience.08:10 Using automation to enhance internal support processes.12:09 Why keeping systems current helps avoid unnecessary issues.15:14 Approaches that strengthen system reliability and efficiency.18:46 Best practices for simplifying complex orchestration dependencies.23:24 Anticipated innovations that expand orchestration capabilities.Resources Mentioned:Collin McNultyhttps://www.linkedin.com/in/collin-mcnulty/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer | Websitehttps://www.astronomer.ioApache Airflowhttps://airflow.apache.org/Prometheushttps://prometheus.io/Splunkhttps://www.splunk.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Sep 18, 2025

25m

68

Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov

The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics.In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building and scaling a centralized data platform with Airflow and Astronomer.Key Takeaways:00:00 Introduction.02:49 Building a centralized data platform for 15 European countries.05:19 Adopting SaaS to manage Airflow from day one.07:01 Leveraging Airflow for data orchestration across products.08:16 Teaching non-Python users how to work with Airflow is challenging.12:25 Creating a global data community across Europe, the US and Japan.14:04 Monthly calls help share knowledge and align regional teams.15:47 Contributing to the open-source Airflow project as a way to deepen expertise.16:32 Desire for more guidelines, debugging tutorials and testing best practices in Airflow.Resources Mentioned: Evgenii Prusovhttps://www.linkedin.com/in/prusov/Daiichi Sankyo Europe GmbH | LinkedInhttps://www.linkedin.com/company/daiichi-sankyo-europe-gmbh/Daiichi Sankyo Europe GmbH | Websitehttps://www.daiichi-sankyo.euApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Snowflakehttps://www.snowflake.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Sep 11, 2025

19m

67

Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base.In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the company leverages Airflow, dbt, and Cosmos to drive marketplace intelligence, improve client connections and deliver measurable growth for professionals.Key Takeaways:00:00 Introduction.05:44 The role of the data engineering team in driving business success.08:52 Leveraging technology for real-time business intelligence.10:52 Data-driven strategies for improving marketing outcomes.13:05 How adopting the right tools can increase revenue growth.14:25 Advantages of simplifying and integrating technical workflows.18:45 Benefits of multi-environment configurations for development and production.20:17 Foundational skills and best practices for learning Airflow effectively.22:33 Opportunities for deeper tool integration and improved data visualization.Resources Mentioned:Paschal Onuorahhttps://www.linkedin.com/in/onuorah-paschal/StyleSeat | LinkedInhttps://www.linkedin.com/company/styleseat/StyleSeat | Websitehttps://www.styleseat.comApache Airflowhttps://airflow.apache.org/dbthttps://www.getdbt.com/Astronomer Cosmoshttps://www.astronomer.io/cosmos/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Sep 4, 2025

23m

66

Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak

The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines.In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak, Principal Product Manager at Astronomer, who share insights into the Astro Executor and remote execution.Key Takeaways:00:00 Introduction.04:13 How product leadership drives scalability for enterprise needs.08:23 Architectural changes that improve reliability and remove bottlenecks.10:15 Metrics that enhance visibility into system performance.12:54 The role of remote execution in addressing security requirements.15:56 Differences between open-source solutions and managed offerings.19:04 Broad industry adoption and applicability of remote execution.20:39 Future advancements in language support and multi-tenancy.Resources Mentioned:Ian Busshttps://www.linkedin.com/in/ian-buss/Piotr Chomiakhttps://www.linkedin.com/in/piotr-chomiak-b1955624/Astronomer | Websitehttps://www.astronomer.ioApache Airflowhttps://airflow.apache.org/Airflow Slack Communityhttps://airflow.apache.org/community/Beyond Analytics conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Aug 28, 2025

22m

65

Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.Key Takeaways:00:00 Introduction.02:13 Overview of the company’s operations and global presence.04:00 The tech stack and structure of the data engineering team.04:24 Running nearly 2,000 DAGs in production using Airflow.05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.07:05 Details on the Kubernetes-based Airflow setup using Helm charts.09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.14:11 Making every team member Airflow-literate through local installation.17:56 Using custom libraries and plugins to extend Airflow functionality.Resources Mentioned:Sébastien Crocquevieillehttps://www.linkedin.com/in/scroc/Numberly | LinkedInhttps://www.linkedin.com/company/numberly/Numberly | Websitehttps://numberly.com/Apache Airflowhttps://airflow.apache.org/Grafanahttps://grafana.com/Apache Kafkahttps://kafka.apache.org/Helm Chart for Apache Airflowhttps://airflow.apache.org/docs/helm-chart/stable/index.htmlKuberneteshttps://kubernetes.io/GitLabhttps://about.gitlab.com/KubernetesPodOperator – Airflowhttps://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.htmlBeyond Analytics Conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Aug 21, 2025

24m

64

How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye

Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency.In this episode, we are joined by Adeolu Adegboye, Data Engineer at Moniepoint Group, who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and support diverse stakeholders across the business.Key Takeaways:(00:00) Introduction. (02:48) The role of data engineering in supporting all business operations.(04:17) Leveraging workflow orchestration to manage daily processes.(05:20) Proactively monitoring for anomalies to prevent potential issues.(08:12) Simplifying complex insights for non-technical teams.(13:01) Improving efficiency through dynamic and parallel workflows.(14:19) Optimizing system performance to handle large-scale operations.(17:19) Exploring creative and innovative uses for workflow automation.Resources Mentioned:Adeolu Adegboyehttps://www.linkedin.com/in/adeolu-adegboye/Moniepoint Group | LinkedInhttps://www.linkedin.com/company/moniepoint-inc/Moniepoint Group | Websitehttps://www.moniepoint.comApache Airflowhttps://airflow.apache.org/ClickHousehttps://clickhouse.com/Grafanahttps://grafana.com/Beyond Analytics Conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Aug 14, 2025

21m

63

Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler

The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments.In this episode, Jens Scheffler, Test Execution Cluster Technical Architect at Bosch, shares insights on how his team’s need for large-scale, cross-environment testing influenced the development of the Edge Executor and shaped this major release.Key Takeaways:(02:39) The role of remote execution in supporting large-scale testing needs.(04:44) How community support contributed to the Edge Executor’s development.(08:41) Navigating network and infrastructure limitations within secure environments.(13:25) Transitioning from database-heavy processes to an API-driven model.(14:16) How the new task SDK in Airflow 3 improves distributed task execution.(16:54) What is required to set up and configure the Edge Executor.(19:36) Managing multiple queues to optimize tasks across different environments.(23:30) Examples of extreme distance use cases for edge execution.Resources Mentioned:Jens Schefflerhttps://www.linkedin.com/in/jens-scheffler/Bosch | LinkedInhttps://www.linkedin.com/company/bosch/Bosch | Websitehttps://www.bosch.com/Apache Airflowhttps://airflow.apache.org/Edge Executor (Edge3 Provider Package)https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.htmlAstronomer’s Astro Executorhttps://www.astronomer.io/docs/astro/astro-executor/Beyond Analytics Conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Aug 7, 2025

28m

62

Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero

Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability.In this episode, we’re joined by Cory O’Daniel, CEO and Co-Founder at Massdriver, and Jacob Ferriero, Senior Software Engineer at Astronomer, to unpack what it takes to make data platform engineering scalable, sustainable and secure. They share lessons from years of experience working with DevOps, ML teams and platform engineers and discuss how Airflow fits into the orchestration layer of today’s data stacks.Key Takeaways:(03:27) Making infrastructure accessible without deep ops knowledge.(07:23) Distinct personas and responsibilities across data teams.(09:53) Infrastructure hurdles specific to ML workloads.(11:13) Compliance and governance shaping platform design.(13:27) Tooling mismatches between teams cause friction.(15:13) Airflow’s orchestration role within broader system architecture.(22:10) Creating reusable infrastructure patterns for consistency.(24:13) Enabling secure access without slowing down development.(26:55) Opportunities to improve Airflow with event-driven and reliability tooling.Resources Mentioned:Cory O’Danielhttps://www.linkedin.com/in/coryodaniel/Massdriver | LinkedInhttps://www.linkedin.com/company/massdriver/Massdriver | Websitehttps://www.massdriver.cloud/Jacob Ferrierohttps://www.linkedin.com/in/jacob-ferriero/Astronomerhttps://www.linkedin.com/company/astronomer/Apache Airflowhttps://airflow.apache.org/Prequelhttps://www.prequel.co/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jul 31, 2025

31m

61

The Future of Airflow Telemetry with Bolke de Bruin

Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust. In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right.Key Takeaways:(03:20) The role of foundations in establishing credibility and sustainability.(04:52) Why data collection is critical to open-source project direction.(07:24) Lessons learned from previous approaches to user data collection.(10:23) The current state of telemetry in the project.(10:53) Community trust as a prerequisite for technical implementation.(12:54) The importance of managing sensitive data within trusted ecosystems.(16:37) Ethical considerations in balancing participation and access.(18:45) Forward-looking ideas for improving workflow design and usability.Resources Mentioned:Bolke de Bruinhttps://www.linkedin.com/in/bolke/Metyis | LinkedInhttps://www.linkedin.com/company/metyis/Metyis | Websitehttp://www.metyis.comApache Airflowhttps://airflow.apache.org/Airflow Summithttps://airflowsummit.org/Airflow Dev Listhttps://lists.apache.org/[email protected]://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jul 17, 2025

21m

60

Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his team built a drag-and-drop DAG editor for non-coders, contributions which helped shape the Airflow 3.0 Ul and introduced features like external XCom control and bulk APls.Key Takeaways:(02:30) Day-to-day responsibilities building platforms that simplify orchestration.(05:27) Factors that make onboarding into large open-source projects accessible.(07:35) The value of improved user interfaces for task state visibility and control.(09:49) Enabling faster debugging by exposing internal data through APIs.(13:00) Balancing frontend design goals with backend functionality.(14:19) Creating workflow editors that lower the barrier to entry.(16:54) Supporting a variety of task types within a visual DAG builder.(19:32) Common infrastructure challenges faced by orchestration users.(20:37) Addressing dependency management across distributed environments.Resources Mentioned:Shubham Rajhttps://www.linkedin.com/in/shubhamrajofficial/Cloudera | LinkedInhttps://www.linkedin.com/company/cloudera/Cloudera | Websitehttps://www.cloudera.com/Apache Airflowhttps://airflow.apache.org/2023 Airflow Summithttps://airflowsummit.org/https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jul 10, 2025

22m

59

Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing

Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows.In this episode, we speak with Yunhao Qing, Software Engineer at Lyft, about building a governed data-engineering platform powered by Airflow that balances flexibility, standardization and scale.Key Takeaways:(03:17) Supporting internal teams with a centralized orchestration platform.(04:54) Migrating to a managed service to reduce infrastructure overhead.(06:04) Embedding platform-level governance into custom components.(08:02) Consolidating and regulating the creation of custom code.(09:48) Identifying and correcting inefficient workflow patterns.(11:17) Replacing manual workarounds with native platform features.(14:32) Preparing teams for major version upgrades.(16:03) Leveraging asset-based scheduling for smarter triggers.(18:13) Envisioning GenAI and semantic search for future productivity.Resources Mentioned:Yunhao Qinghttps://www.linkedin.com/in/yunhao-qingLyft | LinkedInhttps://www.linkedin.com/company/lyft/Lyft | Websitehttps://www.lyft.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Kuberneteshttps://kubernetes.io/https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jul 7, 2025

19m

58

Transforming Customer Education in Data Engineering at Astronomer with Marc Lamberti

Understanding the complexities of Apache Airflow can be daunting for newcomers and seasoned data engineers. But with the right guidance, mastering the tool becomes an achievable milestone.In this episode, Marc Lamberti, Head of Customer Education at Astronomer, joins us to share his journey from Udemy instructor to driving education at Astronomer, and how he's helping over 100,000 learners demystify Airflow.Key Takeaways:(02:36) Early exposure to Airflow while addressing inefficiencies in data workflows.(04:10) Common barriers to implementing open source tools in enterprise settings.(06:18) The shift from part-time teaching to a full-time focus on Airflow education.(07:53) A modular, guided approach to structuring educational content.(09:57) The value of highlighting underused Airflow features for broader adoption.(12:35) Certifications as a method to assess readiness and uncover knowledge gaps.(13:25) Coverage of essential Airflow concepts in the Fundamentals exam.(16:07) The DAG Authoring exam’s emphasis on practical, advanced features.(20:08) A call for more visible integration of Airflow with AI workflows.Resources Mentioned:Marc Lambertihttps://www.linkedin.com/in/marclamberti/Astronomer | LinkedInhttps://www.linkedin.com/company/astronomer/Astronomer Academyhttps://academy.astronomer.io/Airflow Fundamentals Certificationhttps://www.astronomer.io/certification/DAG Authoring Certificationhttps://academy.astronomer.io/plan/astronomer-certification-dag-authoring-for-apache-airflow-examThe Complete Hands-On Introduction to Airflowhttps://www.udemy.com/course/the-complete-hands-on-course-to-master-apache-airflow/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_Beta_Prof_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Beta&utm_content=deal4584&utm_term=_._ag_162511579404_._ad_696197165418_._kw__._de_c_._dm__._pl__._ti_dsa-1677053911088_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21168154305&gbraid=0AAAAADROdO3MpljfP-gssiYSmDEPdhZV9&gclid=Cj0KCQjw097CBhDIARIsAJ3-nxdjZA6G5-Y0-akk6Huksy2PLb04t92J4iNfUSIbMdrSAla_tb-o2N8aArOeEALw_wcB&couponCode=PMNVD3025https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jun 26, 2025

22m

57

Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams.In this episode, we speak with Alberto Crespi, Data Architect at lastminute.com, who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approach.Key Takeaways:(02:17) Defining responsibilities within data architecture teams.(04:15) Consolidating multiple orchestrators into a single solution.(07:00) Scaling Airflow environments with shared infrastructure and DevOps practices.(10:59) Managing dependencies and readiness using SQL sensors.(14:23) Enhancing visibility and response through Slack-integrated monitoring.(19:28) Extending Airflow’s flexibility to run legacy systems.(22:28) Integrating transformation tools into orchestrated pipelines.(25:54) Enabling non-engineers to contribute to pipeline development.(27:33) Fostering adoption through collaboration and communication.Resources Mentioned:Alberto Crespihttps://www.linkedin.com/in/crespialberto/lastminute.com | Websitehttps://lastminute.comApache Airflowhttps://airflow.apache.org/dbt Labshttps://www.getdbt.com/Astronomer Cosmoshttps://github.com/astronomer/astronomer-cosmosGitLabSlackhttps://slack.com/Kuberneteshttps://kubernetes.io/Confluencehttps://www.atlassian.com/software/confluenceSlackhttps://slack.com/https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jun 20, 2025

30m

56

The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla

Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape.In this episode, Anu Pabla, Principal Engineer at The ODP Corporation, joins us to discuss her journey from legacy orchestration patterns to AI-native pipelines and why she sees Airflow as the future of AI workload orchestration.Key Takeaways:(03:43) Engaging with external technology communities fosters innovation.(05:05) Mentoring early-career engineers builds confidence in a complex tech landscape.(07:51) Orchestration patterns continue to evolve with modern data needs.(08:41) Managing AI workflows requires structured and flexible orchestration.(10:35) High-quality, meaningful data remains foundational across use cases.(15:08) Community-driven open source tools offer lasting value.(16:59) Self-healing systems support both legacy and AI pipelines.(20:20) Orchestration platforms can drive future AI-native workloads.Resources Mentioned:Anu Pablahttps://www.linkedin.com/in/atomicap/The ODP Corporationhttps://www.linkedin.com/company/the-odp-corporation/The ODP Corporation | Websitehttps://www.theodpcorp.com/homepageApache Airflowhttps://airflow.apache.org/LlamaIndexhttps://www.llamaindex.ai/https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jun 12, 2025

23m

55

Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel

The orchestration layer is foundational to building robust AI- and ML-powered data pipelines, especially in complex hybrid enterprise environments. IBM’s partnership with Astronomer reflects a strategic alignment to simplify and scale Airflow-based workflows across industries.In this episode, we’re joined by IBM’s Senior Product Manager, BJ Adesoji, and GTM PM and Growth Leader, Ryan Yackel. We discuss how IBM customers are using Airflow in production, the challenges they face at scale and what the new IBM–Astronomer collaboration unlocks.Key Takeaways:(03:09) The growing importance of orchestration tools in enterprise environments.(04:48) How organizations are expanding orchestration beyond traditional use cases.(05:24) Common patterns across industries adopting orchestration platforms.(07:16) Why orchestration is essential for supporting business-critical workloads.(10:00) The role of orchestration in compliance and regulatory processes.(13:02) Challenges enterprises face when managing orchestration infrastructure.(14:58) Opportunities to simplify and centralize orchestration at scale.(19:11) The value of integrating orchestration with broader data toolchains.(20:54) How AI is shaping the future of orchestrated data workflows.Resources Mentioned:BJ Adesojihttps://www.linkedin.com/in/bj-soji/Ryan Yackelhttps://www.linkedin.com/in/ryanyackel/IBM | LinkedInhttps://www.linkedin.com/company/databand-ai/IBM Databandhttps://www.ibm.com/products/databandIBM DataStagehttps://www.ibm.com/products/datastageIBM watsonx.governancehttps://www.ibm.com/products/watsonx-governanceIBM Knowledge Cataloghttps://www.ibm.com/products/knowledge-catalogApache Airflowhttps://airflow.apache.org/watsonx Orchestratehttps://www.ibm.com/products/watsonx-orchestrateDominohttps://domino.ai/Astronomerhttps://www.astronomer.io/Snowflakehttps://www.snowflake.com/en/dbt Labshttps://www.getdbt.com/Amazon SageMakerhttps://aws.amazon.com/sagemaker/Clouderahttps://www.cloudera.com/MongoDBhttps://www.mongodb.com/https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Jun 5, 2025

24m

54

Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich

Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich, Data Developer for Data Science at Wix, shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team.In this episode, Gil explains how this internal framework simplifies DAG creation, improves documentation accuracy, and enables consistent task generation for machine learning pipelines. He also shares lessons from complex DAG optimization and maintaining testable code.Key Takeaways:(03:23) Code duplication creates long-term problems.(08:16) Frameworks bring order to complex pipelines.(09:41) Shared functions cut down repetitive code.(17:18) Auto-generated docs stay accurate by design.(22:40) On-demand DAGs support real-time workflows.(25:08) Task-level sensors improve run efficiency.(27:40) Combine local runs with automated tests.(30:09) Clean code helps teams scale faster.Resources Mentioned:Gil Reichhttps://www.linkedin.com/in/gilreich/Wix | LinkedInhttps://www.linkedin.com/company/wix-com/Wix | Websitehttps://www.wix.com/DS DAG Frameworkhttps://airflowsummit.org/slides/2024/92-refactoring-dags.pdfApache Airflowhttps://airflow.apache.org/https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

May 29, 2025

31m

53

Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero

Legacy architecture and AI workloads pose unique challenges at scale, especially in a global enterprise with complex data systems. In this episode, we explore strategies to proactively monitor and optimize pipelines while minimizing downstream failures.Adonis Castillo Cordero, Senior Automation Manager at Procter & Gamble, joins us to share actionable best practices for dependency mapping, anomaly detection and architecture simplification using Apache Airflow.Key Takeaways:(03:13) Integrating legacy data systems into modern architecture.(05:51) Designing workflows for real-time data processing.(07:57) Mapping dependencies early to avoid pipeline failures.(09:02) Building automated monitoring into orchestration frameworks.(12:09) Detecting anomalies to prevent performance bottlenecks.(15:24) Monitoring data quality to catch silent failures.(17:02) Prioritizing responses based on impact severity.(18:55) Simplifying dashboards to highlight critical metrics.Resources Mentioned:Adonis Castillo Corderohttps://www.linkedin.com/in/adoniscc/Procter & Gamble | LinkedInhttps://www.linkedin.com/company/procter-and-gamble/Procter & Gamble | Websitehttp://www.pg.comApache Airflowhttps://airflow.apache.org/OpenLineagehttps://openlineage.io/Azure Monitorhttps://azure.microsoft.com/en-us/products/monitor/AWS Lookout for Metricshttps://aws.amazon.com/lookout-for-metrics/Monte Carlohttps://www.montecarlodata.com/Great Expectationshttps://greatexpectations.io/https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

May 22, 2025

22m