EPISODE · May 29, 2024 · 8 MIN
We aren't running out of training data, we are running out of open training data
from Interconnects · host Nathan Lambert
Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/the-data-wall0:00 We aren't running out of training data, we are running out of open training data2:51 Synthetic data: 1 trillion new tokens per day4:18 Data licensing deals: High costs per token6:33 Better tokens: Search and new frontiers This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
NOW PLAYING
We aren't running out of training data, we are running out of open training data
No transcript for this episode yet
Similar Episodes
May 20, 2026 ·8m
May 12, 2026 ·4m
Apr 28, 2026 ·7m
Apr 22, 2026 ·8m