EPISODE · Oct 9, 2017 · 21 MIN
Episode 24: How to handle imbalanced datasets
from Data Science at Home · host Francesco Gadaleta <frag>
In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes. Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with. Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders. In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected). There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations. In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.In this episode I explain how to deal with such common and challenging scenarios. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit datascienceathome.substack.com
NOW PLAYING
Episode 24: How to handle imbalanced datasets
No transcript for this episode yet
Similar Episodes
Apr 20, 2026 ·75m
Apr 16, 2026 ·84m
Apr 13, 2026 ·79m
Apr 6, 2026 ·116m
Mar 30, 2026 ·126m
Mar 27, 2026 ·17m