EPISODE · Nov 5, 2024 · 1H 4M
Deep Dive into Inference Optimization for LLMs with Philip Kiely
from Software Huddle · host Software Huddle
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.
What this episode covers
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads. We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.
NOW PLAYING
Deep Dive into Inference Optimization for LLMs with Philip Kiely
No transcript for this episode yet
Similar Episodes
Mar 10, 2026 ·83m
Feb 17, 2026 ·94m
Jan 19, 2026 ·90m
Jan 5, 2026 ·98m
Dec 22, 2025 ·85m