#84- FineWeb, the best dataset to pre-train LLMs.
An episode of the Life with AI podcast, hosted by Filipe Lauar, titled "#84- FineWeb, the best dataset to pre-train LLMs." was published on June 13, 2024 and runs 12 minutes.
June 13, 2024 ·12m · Life with AI
Summary
Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results. Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
Episode Description
Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results.
Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
Similar Episodes
Mar 19, 2026 ·44m
Sep 11, 2025 ·29m
Jan 15, 2025 ·15m
Jan 15, 2025 ·18m