Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam episode artwork

EPISODE · Mar 26, 2026 · 22 MIN

Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam

from The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI · host Astronomer

Data quality is not optional when you manage credit data at scale.In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform.Key Takeaways:00:00 Introduction.01:00 Overview of Credit Karma’s products and financial data ecosystem.02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.03:00 Why PII handling and controlled downstream access led to adopting Airflow.04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.05:00 Why data quality and governance are critical in financial systems.07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.09:00 Introduction to DAG Factory and YAML-based DAG generation.10:00 GitHub executor creates PR-driven DAG workflows with CI checks.12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.13:00 Failed checks stop ETL pipelines and trigger notifications.17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.Resources Mentioned:Ashir Alamhttps://www.linkedin.com/in/ashir-alam/Credit Karmahttps://www.linkedin.com/company/intuit-credit-karma/Apache Airflowhttps://airflow.apache.org/DAG Factoryhttps://github.com/astronomer/dag-factoryBigQuery (Google Cloud)https://cloud.google.com/bigqueryGitHubhttps://github.com/Slackhttps://slack.com/PagerDutyhttps://www.pagerduty.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

Data quality is not optional when you manage credit data at scale.In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform.Key Takeaways:00:00 Introduction.01:00 Overview of Credit Karma’s products and financial data ecosystem.02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.03:00 Why PII handling and controlled downstream access led to adopting Airflow.04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.05:00 Why data quality and governance are critical in financial systems.07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.09:00 Introduction to DAG Factory and YAML-based DAG generation.10:00 GitHub executor creates PR-driven DAG workflows with CI checks.12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.13:00 Failed checks stop ETL pipelines and trigger notifications.17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.Resources Mentioned:Ashir Alamhttps://www.linkedin.com/in/ashir-alam/Credit Karmahttps://www.linkedin.com/company/intuit-credit-karma/Apache Airflowhttps://airflow.apache.org/DAG Factoryhttps://github.com/astronomer/dag-factoryBigQuery (Google Cloud)https://cloud.google.com/bigqueryGitHubhttps://github.com/Slackhttps://slack.com/PagerDutyhttps://www.pagerduty.com/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow

NOW PLAYING

Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam

0:00 22:04

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI?

This episode is 22 minutes long.

When was this The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI episode published?

This episode was published on March 26, 2026.

What is this episode about?

Data quality is not optional when you manage credit data at scale.In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality...

Can I download this The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!