626: Subword Tokenization with Byte-Pair Encoding
Word tokenization, character tokenization and subword tokenization go head-to-head this week as Jon Krohn delivers a mini-bootcamp on the NLP-related process.Additional materials: www.superdatascience.com/626Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
An episode of the Super Data Science: ML & AI Podcast with Jon Krohn podcast, hosted by Jon Krohn, titled "626: Subword Tokenization with Byte-Pair Encoding" was published on November 11, 2022 and runs 6 minutes.
November 11, 2022 ·6m · Super Data Science: ML & AI Podcast with Jon Krohn
Summary
Word tokenization, character tokenization and subword tokenization go head-to-head this week as Jon Krohn delivers a mini-bootcamp on the NLP-related process. Additional materials: www.superdatascience.com/626 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Episode Description
Similar Episodes
Mar 10, 2026 ·43m
Feb 17, 2026 ·57m