#84- FineWeb, the best dataset to pre-train LLMs.

Life with AI - A podcast by Filipe Lauar

Categories:

Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results. Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai

Visit the podcast's native language site