The server-side rendering equivalent for LLM inference workloads episode artwork

EPISODE · Aug 19, 2025 · 21 MIN

The server-side rendering equivalent for LLM inference workloads

from The Stack Overflow Podcast

Ryan is joined by Tuhin Srivastava, CEO and co-founder of Baseten, to explore the evolving landscape of AI infrastructure and inference workloads, how the shift from traditional machine learning models to large-scale neural networks has made GPU usage challenging, and the potential future of hardware-specific optimizations in AI. Episode notes:Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring AI products to market fast.Connect with Tuhin on LinkedIn or reach him at his email [email protected]. Shoutout to user Hitesh for winning a Populist badge for their answer to Cannot drop database because it is currently in use. See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Ryan is joined by Tuhin Srivastava, CEO and co-founder of Baseten, to explore the evolving landscape of AI infrastructure and inference workloads, how the shift from traditional machine learning models to large-scale neural networks has made GPU usage challenging, and the potential future of hardware-specific optimizations in AI. Episode notes:Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring AI products to market fast.Connect with Tuhin on LinkedIn or reach him at his email [email protected]. Shoutout to user Hitesh for winning a Populist badge for their answer to Cannot drop database because it is currently in use. See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

NOW PLAYING

The server-side rendering equivalent for LLM inference workloads

0:00 21:44

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The Stack Overflow Podcast?

This episode is 21 minutes long.

When was this The Stack Overflow Podcast episode published?

This episode was published on August 19, 2025.

What is this episode about?

Ryan is joined by Tuhin Srivastava, CEO and co-founder of Baseten, to explore the evolving landscape of AI infrastructure and inference workloads, how the shift from traditional machine learning models to large-scale neural networks has made GPU...

Can I download this The Stack Overflow Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!