Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille episode artwork

EPISODE · Aug 21, 2025 · 24 MIN

Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

from The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI · host Astronomer

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.Key Takeaways:00:00 Introduction.02:13 Overview of the company’s operations and global presence.04:00 The tech stack and structure of the data engineering team.04:24 Running nearly 2,000 DAGs in production using Airflow.05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.07:05 Details on the Kubernetes-based Airflow setup using Helm charts.09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.14:11 Making every team member Airflow-literate through local installation.17:56 Using custom libraries and plugins to extend Airflow functionality.Resources Mentioned:Sébastien Crocquevieillehttps://www.linkedin.com/in/scroc/Numberly | LinkedInhttps://www.linkedin.com/company/numberly/Numberly | Websitehttps://numberly.com/Apache Airflowhttps://airflow.apache.org/Grafanahttps://grafana.com/Apache Kafkahttps://kafka.apache.org/Helm Chart for Apache Airflowhttps://airflow.apache.org/docs/helm-chart/stable/index.htmlKuberneteshttps://kubernetes.io/GitLabhttps://about.gitlab.com/KubernetesPodOperator – Airflowhttps://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.htmlBeyond Analytics Conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.Key Takeaways:00:00 Introduction.02:13 Overview of the company’s operations and global presence.04:00 The tech stack and structure of the data engineering team.04:24 Running nearly 2,000 DAGs in production using Airflow.05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.07:05 Details on the Kubernetes-based Airflow setup using Helm charts.09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.14:11 Making every team member Airflow-literate through local installation.17:56 Using custom libraries and plugins to extend Airflow functionality.Resources Mentioned:Sébastien Crocquevieillehttps://www.linkedin.com/in/scroc/Numberly | LinkedInhttps://www.linkedin.com/company/numberly/Numberly | Websitehttps://numberly.com/Apache Airflowhttps://airflow.apache.org/Grafanahttps://grafana.com/Apache Kafkahttps://kafka.apache.org/Helm Chart for Apache Airflowhttps://airflow.apache.org/docs/helm-chart/stable/index.htmlKuberneteshttps://kubernetes.io/GitLabhttps://about.gitlab.com/KubernetesPodOperator – Airflowhttps://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.htmlBeyond Analytics Conferencehttps://astronomer.io/beyond/dataflowcastThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning

NOW PLAYING

Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

0:00 24:17

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI?

This episode is 24 minutes long.

When was this The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI episode published?

This episode was published on August 21, 2025.

What is this episode about?

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using...

Can I download this The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!