Ensuring Data Quality at Petabyte Scale [Glassdoor]
Episode 92 of the Snacks Weekly on Data Science podcast, hosted by Pan Wu, titled "Ensuring Data Quality at Petabyte Scale [Glassdoor]" was published on June 30, 2025 and runs 11 minutes.
June 30, 2025 ·11m · Snacks Weekly on Data Science
Summary
In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4
Episode Description
In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4
Similar Episodes
Jun 19, 2025 ·46m
Jun 13, 2025 ·40m
May 20, 2025 ·80m
May 13, 2025 ·74m
May 7, 2025 ·64m