PodParley PodParley

Episode 24: How to handle imbalanced datasets

Episode 16 of the Data Science at Home podcast, hosted by Francesco Gadaleta, titled "Episode 24: How to handle imbalanced datasets" was published on October 9, 2017 and runs 21 minutes.

October 9, 2017 ·21m · Data Science at Home

0:00 / 0:00

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes.  Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with. Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders. In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected). There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations. In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem. In this episode I explain how to deal with such common and challenging scenarios.

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes.  Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with. Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders. In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected). There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations. In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.

In this episode I explain how to deal with such common and challenging scenarios.

The Analytics Engineering Podcast dbt Labs, Inc. Tristan Handy has been curating the Analytics Engineering Roundup newsletter since 2015, pulling together the internet's best data science & analytics articles.Tristan and co-host Julia Schottenstein now bring the Roundup to real life, hosting biweekly conversations with data practitioners inventing the future of analytics engineering.You can view full episode summaries and read back issues of the Roundup newsletter at https://roundup.getdbt.com.The podcast is sponsored by dbt labs, makers of the data transformation framework dbt. To reach our team, drop a note to [email protected]. Explicit STEM.queer() Vera Sativa Machine learning, data science, feminismo y queer anarquismo.Episodios cada 2 semanas. Explicit 天方烨谈 基因频道 华大基因专业团队倾情打造,基因科普娓娓道来! Explicit Explorers Wanted 5d20 Media, LLC We are an actual play podcast using the Numenera (http://numenera.com) Discovery and Destiny rules. Set one billion years in the future, we journey across the Ninth World. There have been eight worlds before this, where civilizations rose to intergalactic heights only to fall into ashes, leaving a world of strange relics behind them. Join our ragtag crew of messy adventurers as they navigate weird ruins, contend with criminal intrigue, and ignore their own better judgment... Repeatedly.See more (https://www.explorerswanted.fm/about)Become a Patron!Campaign Two: Hearts in Orbit<img src="https://files.fireside.fm/file/fireside-uploads/images/2/213fef3d-303d-4053-8ec2-96e695eef9f5/mDJd_g4e.png" alt="Three figures, from left: Ezr Explicit
URL copied to clipboard!