EPISODE · Dec 6, 2024 · 8 MIN
Abstracts: NeurIPS 2024 with Weizhu Chen
from Microsoft Research Podcast · host Researchers across the Microsoft research community
Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.Read the paperGet the code
NOW PLAYING
Abstracts: NeurIPS 2024 with Weizhu Chen
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m