EPISODE · Nov 25, 2022 · 1H 2M
Anton Teaches Packy AI | Ep 2 | Chinchilla
from "Age of Miracles" · host Packy McCormick | Turpentine
We're back! In Episode 2, Anton Teaches Packy about Deepmind's March 2022 paper, Training Compute-Optimal Large Language Models, or as it's more commonly known, Chinchilla. Prior to Chinchilla, the best way to improve the performance of LLMs was thought to be by scaling up the size of the model. As a result, the largest models now have over 500 billion parameters. But there are only so many GPUs in the world, and throwing compute at the problem is expensive and energy intensive. In this paper, Deepmind found that the optimal way to scale an LLM is actually by scaling size (parameters) and training (data) proportionally. Given the race for size, today's models are plenty big but need a lot more data. In this conversation, we go deep on the paper itself, but we also zoom out to talk about the politics of AI, when AGI is going to hit, where to get more data, and why AI won't take our jobs. This one gets a lot more philosophical than our first episode as we explore the implications of Chinchilla and LLMs more generally. If you enjoyed this conversation, subscribe for more. We're going to try to release one episode per week, and we want to make this the best way to get a deeper understanding of the mind-blowing progress happening in AI and what it means for everything we do as humans. LINKS: Training Compute-Optimal Large Language Models: https://arxiv.org/abs/2203.15556 chinchilla's wild implications: https://www.lesswrong.com/posts/6Fpvc... Scaling Laws for Neural Language Models (Kaplan et al): https://arxiv.org/abs/2001.08361 --- Send in a voice message: https://podcasters.spotify.com/pod/show/ageofmiracles/message
What this episode covers
We're back! In Episode 2, Anton Teaches Packy about Deepmind's March 2022 paper, Training Compute-Optimal Large Language Models, or as it's more commonly known, Chinchilla. Prior to Chinchilla, the best way to improve the performance of LLMs was thought to be by scaling up the size of the model. As a result, the largest models now have over 500 billion parameters. But there are only so many GPUs in the world, and throwing compute at the problem is expensive and energy intensive. In this paper, Deepmind found that the optimal way to scale an LLM is actually by scaling size (parameters) and training (data) proportionally. Given the race for size, today's models are plenty big but need a lot more data. In this conversation, we go deep on the paper itself, but we also zoom out to talk about the politics of AI, when AGI is going to hit, where to get more data, and why AI won't take our jobs. This one gets a lot more philosophical than our first episode as we explore the implications of Chinchilla and LLMs more generally. If you enjoyed this conversation, subscribe for more. We're going to try to release one episode per week, and we want to make this the best way to get a deeper understanding of the mind-blowing progress happening in AI and what it means for everything we do as humans. LINKS: Training Compute-Optimal Large Language Models: https://arxiv.org/abs/2203.15556 chinchilla's wild implications: https://www.lesswrong.com/posts/6Fpvc... Scaling Laws for Neural Language Models (Kaplan et al): https://arxiv.org/abs/2001.08361 --- Send in a voice message: https://podcasters.spotify.com/pod/show/ageofmiracles/message
NOW PLAYING
Anton Teaches Packy AI | Ep 2 | Chinchilla
No transcript for this episode yet
Similar Episodes
Dec 5, 2025 ·50m
Oct 9, 2025 ·33m
Oct 3, 2025 ·40m
Sep 11, 2025 ·31m
Aug 27, 2025 ·39m
Aug 18, 2025 ·54m