EPISODE · May 25, 2026 · 8 MIN
Building Taxonomies with Large Language Models [Microsoft]
from Snacks Weekly on Data Science · host Pan Wu
In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1
What this episode covers
In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/from-chaos-to-clarity-building-taxonomies-from-unstructured-text-using-large-language-models-c1303db3adb1
NOW PLAYING
Building Taxonomies with Large Language Models [Microsoft]
No transcript for this episode yet
Similar Episodes
Apr 22, 2025 ·32m
Feb 27, 2025 ·0m
Sep 20, 2024 ·57m
Aug 7, 2024 ·16m