Infrastructure for AI at Scale - With Benny Chen (Firewor...

What this episode covers

We talk a lot on this show about RL, agents, and the move between pre-training and post-training, but not enough about the layer everything actually runs on. Benny Chen, co-founder of Fireworks AI, one of the largest inference platforms around, walks us through what it takes to serve models at scale: sourcing GPUs, writing the kernels, the runtime, and the routing layer that lets a customer hit one endpoint and forget the rest.We talk why the real bottleneck is power, not chips, and why that favors Nvidia and Google. Why MoE keeps winning even when dense models look better on paper and why he'd rather run fungible capacity at 95% than specialized chips at 60%. We also talk about quantization limits, where RL efficiency has to go next, and his case that AI is still under-hyped. We also get into cross-region training, sparse autoencoders and why interpretability hasn't taken off in open source, whether open models can close the gap, and a frank read on Anthropic's go-to-market.Timeline00:00 — Intro: the part of AI nobody talks about01:20 — What "infrastructure for AI" actually means: the layers, from GPUs up to routing02:59 — Why not just buy your own GPUs and do it yourself?05:17 — The scale Fireworks runs at06:35 — Hardware inflation, GPU costs, and the real risk hiding in commit duration10:14 — Nvidia vs AMD vs TPUs, and why power is the bottleneck11:57 — Mixing GPU types and generations; fungibility vs. specialization14:22 — Once you have the GPUs, what's the next layer to build?17:04 — Dense vs. MoE, and why the hardware picks the winner21:07 — Quantization: is FP4 the floor? TurboQuant and INT vs. FP24:28 — How tied are the algorithms to the hardware?25:12 — DeepSeek, DeepGEMM, and next-token prediction as reconstruction loss28:50 — Why RL is still wildly inefficient compared to pre-training30:08 — Speculative decoding, AI-generated kernels, and auto-research34:00 — The AGI question: why text gets automated but vision may stay expensive37:07 — Hype check: why Benny thinks AI is still under-hyped41:28 — Training vs. inference at the infrastructure level44:12 — Scaling across data centers: cross-region training with Cursor45:40 — Sparse autoencoders, interpretability, and why open source is human-constrained49:04 — Will open models catch up — on quality and on compute?51:41 — Are we plateauing? Opus 4.7 vs. 4.6 and the coming data wars54:41 — Physical limits, HBM, and whether chips keep getting faster58:17 — The belief about inference everyone gets wrong59:31 — Anthropic, mythos, and a frank take on go-to-market1:04:41 — Wrap-upMusic:"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Share this episode

Similar Episodes

I'm ok

Mar 26, 2026 ·1m

Food Saved My Life

Mar 19, 2026 ·34m

Eat More Vegetables: The 4 Foods That Beat Ozempic (Naturally)

Feb 18, 2026 ·11m

How to End Heart Disease with Dr. Fuhrman

Feb 11, 2026 ·45m

Revolutionizing Breast Health: QT Imaging, Overdiagnosis, and What to Do Instead

Jan 27, 2026 ·35m

REMIX: Why we over-shop and compulsively acquire, and how to stop, with Dr Jan Eppingstall

Jan 9, 2026 ·61m

Similar Podcasts

MG Show MG Show The MG Show, hosted by Jeffrey Pedersen and Shannon Townsend, is a leading alternative media platform dedicated to uncovering the truth behind today’s most pressing political issues. Launched in 2019, the show has grown exponentially, offering unfiltered insights, comprehensive research, and real-time analysis. With a commitment to independent journalism and factual integrity, the MG Show empowers its audience with knowledge and encourages active participation in the political discourse. Ask A Spaceman Archives - 365 Days of Astronomy Ask A Spaceman Archives - 365 Days of Astronomy Podcasting Astronomy Every Day of the Year Breaking News Show | eTurboNews Juergen Thomas Steinmetz News is relevant to the global travel and tourism industry, human rights and global issues.Breaking news when it happens and only from the source. Eat to Live Jenna Fuhrman, Dr. Fuhrman Our health is our most precious gift and smart nutrition can change your life. Each month, join Dr. Fuhrman and his daughter, Jenna Fuhrman as they discuss important topics in the world of nutrition. Eat to Live will change the way you eat and think about food.

Frequently Asked Questions

How long is this episode of The Information Bottleneck?

This episode is 1 hour and 5 minutes long.

When was this The Information Bottleneck episode published?

This episode was published on June 24, 2026.

What is this episode about?

We talk a lot on this show about RL, agents, and the move between pre-training and post-training, but not enough about the layer everything actually runs on. Benny Chen, co-founder of Fireworks AI, one of the largest inference platforms around,...

Can I download this The Information Bottleneck episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.