All Episodes - When Clean Data Is Actually Dirty
We often treat data cleaning as a neutral step.Delete missing rows. Fill gaps with the mean. Move on.But cleaning is not neutral. It is a modeling decision.In this episode, we unpack the statistical consequences of deletion and simple imputation, and why what looks “clean” can fundamentally alter your estimand, distort variance, and bias inference.We walk through:The formal role of the missingness indicatorThe difference between MCAR, MAR, and MNARWhy complete-case analysis is rarely as safe as it seemsHow mean imputation collapses variance and attenuates regression slopesWhen multiple imputation and inverse probability weighting are appropriateWhy sensitivity analysis becomes essential under MNARIf you cannot defend MCAR, deletion and mean imputation are high-risk defaults.Cleaning is not preprocessing.Cleaning is inference
View Podcast Details