EPISODE · Jun 13, 2024 · 12 MIN
#84- FineWeb, the best dataset to pre-train LLMs.
from Life with AI · host Filipe Lauar
Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results. Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
What this episode covers
Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results. Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
NOW PLAYING
#84- FineWeb, the best dataset to pre-train LLMs.
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m