PodParley PodParley

Ensuring Data Quality at Petabyte Scale [Glassdoor]

Episode 92 of the Snacks Weekly on Data Science podcast, hosted by Pan Wu, titled "Ensuring Data Quality at Petabyte Scale [Glassdoor]" was published on June 30, 2025 and runs 11 minutes.

June 30, 2025 ·11m · Snacks Weekly on Data Science

0:00 / 0:00

In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4

In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.

For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4

URL copied to clipboard!