Garbage In, Garbage Out: Why Your Air Quality Models Are Only as Good as Your Data - OT37 episode artwork

EPISODE · Mar 5, 2026 · 9 MIN

Garbage In, Garbage Out: Why Your Air Quality Models Are Only as Good as Your Data - OT37

from Air Quality Matters · host simon jones

This week, we tackle a question that goes to the heart of the performance gap in buildings: What if the problem isn't just poor construction or shoddy installation—but the data we're feeding into our models in the first place? There's an old saying in computer science: garbage in, garbage out. If you feed a perfect model with bad assumptions, you get a perfect calculation of a fantasy. And that's exactly what's been happening in indoor air quality modeling for decades. We've been relying on scattered, outdated, inconsistent emission rate data—pulled from 1990s conference papers, paywalled journals, and PDF reports buried in the internet—and wondering why our buildings don't perform as predicted. The paper is titled Pandora: An Open Access Database of Indoor Pollutant Emission Rates for Indoor Air Quality Modeling, published in the Journal of Building Engineering. It's the work of a huge international team, including Mark Adobati and colleagues from Annex 86, and it represents a massive effort to clean up the mess of data that indoor air quality modelers have been struggling with for years. Key Topics Discussed: The Data Problem: Why finding reliable emission rates for indoor pollutants has been a nightmare—scattered across thousands of sources, often in the wrong units, measured under weird conditions, and completely inconsistent. What Pandora Is: An open access, web-based database systematically compiling nearly 10,000 specific emission rates from the scientific literature, categorizing 740 different pollution sources—from paints and carpets to cleaning products, furniture, and even human beings. The Shocking Case Study: A simple child's bedroom modeled three different ways using data from Pandora. The total formaldehyde emission rate ranged from 342 micrograms per hour to over 6,000 micrograms per hour—a factor of 20 difference. If you designed ventilation based on the lower number, a trickle vent might be fine. Based on the higher number, you'd be installing industrial extraction. Why the Huge Discrepancy: The database contains data going back to the 1980s, when building materials were dirty—paints full of solvents, glues full of formaldehyde. Regulations like the French VOC label and German AGBB standard have forced manufacturers to clean up their act. If you use a statistical average of all data ever published, you're skewing your model with dirty data from 1995, predicting a problem that might not exist anymore. The Recommendation: Use the 25th percentile of the data for things like formaldehyde. This lower value is likely a much more accurate representation of modern, regulation-compliant materials. We might be systematically overestimating the chemical load from building materials if we rely on older datasets. Pandora: An Open Access Database of Indoor Pollutant Emission Rates for Indoor Air Quality Modeling https://doi.org/10.1016/j.jobe.2025.114216 Pandora Database: https://db-pandora.univ-lr.fr/ The One Take Podcast in Partnership with SafeTraces (https://www.safetraces.com/) and Inbiot (https://www.inbiot.es/?utm_campaign=simon&utm_source=airqualitymatters&utm_medium=podcast) Do check them out in the links and on the Air Quality Matters Website (https://www.airqualitymatters.net/podcast) Chapters 00:00:00 Introduction: The Data We Rely On 00:01:06 Garbage In, Garbage Out: The Input Data Problem 00:01:45 Introducing Pandora: A Massive Data Compilation Effort 00:02:27 The Scattered Data Nightmare: Why We Needed This 00:03:08 What's Inside: Construction Materials Dominate the Database 00:03:43 The Overlooked Sources: Cleaning Products and Human Pollution 00:04:34 The Case Study: A Child's Bedroom Reveals a Shocking Problem 00:05:41 The 20X Problem: Why Data Selection Method Matters Enormously 00:06:06 The Time Trap: Old Dirty Data Versus Modern Clean Materials 00:06:43 The Recommendation: Use the 25th Percentile for Modern Materials 00:07:03 The So What: We Might Be Solving Problems That Don't Exist Anymore 00:07:27 The New Risks: Recreational Chemicals and Activity-Based Pollution 00:08:17 The Living Project: Pandora Needs to Grow and Evolve 00:08:38 The Path Forward: From Guessing to Engineering Precision 00:08:59 Closing: Transparency and Understanding the Invisible Cloud

This week, we tackle a question that goes to the heart of the performance gap in buildings: What if the problem isn't just poor construction or shoddy installation—but the data we're feeding into our models in the first place? There's an old saying in computer science: garbage in, garbage out. If you feed a perfect model with bad assumptions, you get a perfect calculation of a fantasy. And that's exactly what's been happening in indoor air quality modeling for decades. We've been relying on scattered, outdated, inconsistent emission rate data—pulled from 1990s conference papers, paywalled journals, and PDF reports buried in the internet—and wondering why our buildings don't perform as predicted. The paper is titled Pandora: An Open Access Database of Indoor Pollutant Emission Rates for Indoor Air Quality Modeling, published in the Journal of Building Engineering. It's the work of a huge international team, including Mark Adobati and colleagues from Annex 86, and it represents a massive effort to clean up the mess of data that indoor air quality modelers have been struggling with for years. Key Topics Discussed: The Data Problem: Why finding reliable emission rates for indoor pollutants has been a nightmare—scattered across thousands of sources, often in the wrong units, measured under weird conditions, and completely inconsistent. What Pandora Is: An open access, web-based database systematically compiling nearly 10,000 specific emission rates from the scientific literature, categorizing 740 different pollution sources—from paints and carpets to cleaning products, furniture, and even human beings. The Shocking Case Study: A simple child's bedroom modeled three different ways using data from Pandora. The total formaldehyde emission rate ranged from 342 micrograms per hour to over 6,000 micrograms per hour—a factor of 20 difference. If you designed ventilation based on the lower number, a trickle vent might be fine. Based on the higher number, you'd be installing industrial extraction. Why the Huge Discrepancy: The database contains data going back to the 1980s, when building materials were dirty—paints full of solvents, glues full of formaldehyde. Regulations like the French VOC label and German AGBB standard have forced manufacturers to clean up their act. If you use a statistical average of all data ever published, you're skewing your model with dirty data from 1995, predicting a problem that might not exist anymore. The Recommendation: Use the 25th percentile of the data for things like formaldehyde. This lower value is likely a much more accurate representation of modern, regulation-compliant materials. We might be systematically overestimating the chemical load from building materials if we rely on older datasets. Pandora: An Open Access Database of Indoor Pollutant Emission Rates for Indoor Air Quality Modeling https://doi.org/10.1016/j.jobe.2025.114216 Pandora Database: https://db-pandora.univ-lr.fr/ The One Take Podcast in Partnership with SafeTraces (https://www.safetraces.com/) and Inbiot (https://www.inbiot.es/?utm_campaign=simon&utm_source=airqualitymatters&utm_medium=podcast) Do check them out in the links and on the Air Quality Matters Website (https://www.airqualitymatters.net/podcast) Chapters 00:00:00 Introduction: The Data We Rely On 00:01:06 Garbage In, Garbage Out: The Input Data Problem 00:01:45 Introducing Pandora: A Massive Data Compilation Effort 00:02:27 The Scattered Data Nightmare: Why We Needed This 00:03:08 What's Inside: Construction Materials Dominate the Database 00:03:43 The Overlooked Sources: Cleaning Products and Human Pollution 00:04:34 The Case Study: A Child's Bedroom Reveals a Shocking Problem 00:05:41 The 20X Problem: Why Data Selection Method Matters Enormously 00:06:06 The Time Trap: Old Dirty Data Versus Modern Clean Materials 00:06:43 The Recommendation: Use the 25th Percentile for Modern Materials 00:07:03 The So What: We Might Be Solving Problems That Don't Exist Anymore 00:07:27 The New Risks: Recreational Chemicals and Activity-Based Pollution 00:08:17 The Living Project: Pandora Needs to Grow and Evolve 00:08:38 The Path Forward: From Guessing to Engineering Precision 00:08:59 Closing: Transparency and Understanding the Invisible Cloud

NOW PLAYING

Garbage In, Garbage Out: Why Your Air Quality Models Are Only as Good as Your Data - OT37

0:00 9:51

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

Critical Conversations by Mind the Frontline Chris Smetana Welcome to ”Critical Conversations by Mind the Frontline,” your ultimate source for in-depth discussions on first responder mental health, wellness, and recovery.Our vodcast is dedicated to providing crucial insights for police, fire, EMS, allied health workers, dispatchers, air medical, military personnel, and their families.In each episode, we tackle essential topics, including mental health strategies, recovery methods, treatment options, the latest research, and professional development opportunities.Join us as we come together to foster resilience within the entire first responder community. Don’t miss out – subscribe now and be part of this vital mission.Find out more at www.mindthefrontline.org#CriticalConversations #MindTheFrontline #FirstResponderMentalHealth #WellnessJourney #CommunitySupport Hyperfluent Hypio Hyperfluent transmits straight from the heart of Hyperliquid, where culture, creativity, and capital converge. Anchored by the architects of Hypio—the decentralized cultural virus—each episode archives the minds engineering the blockchain built to house all finance. These conversations are traceable artifacts in HyperEVM’s evolution: not just what’s being built, but why it matters, how it mutates, and where it’s taking us next. Listen in for the blueprints, the blind spots, and the narrative weapons shaping tomorrow’s markets.Hyperfluent: learn the language, ride the wave, spread the strain. 🎙️Truth and Testimony the Broadcast Ray Gauthier & Adrian Scott This Podcast discusses and teaches the word of God. You will hear about world news and how it relates to bible prophecy. You will also hear interviews and testimonies from men and women of God who have devoted their lives to serving Yeshua (Jesus). Hosted by Ray Gauthier and Adrian Scott. These two long term broadcast colleagues have joined forces once again to provide you the highest quality in broadcast excellence, all for the glory of Yahweh: the God of all creation!You can see most of the podcasts uploaded here at our Youtube Channel.https://www.youtube.com/@truthandtestimonythebroadcast Mobile Money by moomoo Mobile Money by moomoo Hear from seasoned traders, financial influencers, and industry insiders as they discuss money matters and market news and share their personal finance stories.Disclaimers: https://www.moomoo.com/us/support/topic4_523

Frequently Asked Questions

How long is this episode of Air Quality Matters?

This episode is 9 minutes long.

When was this Air Quality Matters episode published?

This episode was published on March 5, 2026.

What is this episode about?

This week, we tackle a question that goes to the heart of the performance gap in buildings: What if the problem isn't just poor construction or shoddy installation—but the data we're feeding into our models in the first place? There's an old saying...

Can I download this Air Quality Matters episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!