EPISODE · Oct 6, 2023 · 39 MIN
LLM Inference Speed (Tech Deep Dive)
from Thinking Machines: AI & Philosophy · host Daniel Reid Cahn
In this tech talk, we dive deep into the technical specifics around LLM inference.The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?We jump into:Is fast model inference the real moat for LLM companies?What are the implications of slow model inference on the future of decentralized and edge model inference?As demand rises, what will the latency/throughput tradeoff look like?What innovations on the horizon might massively speed up model inference?
What this episode covers
In this tech talk, we dive deep into the technical specifics around LLM inference.The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?We jump into:Is fast model inference the real moat for LLM companies?What are the implications of slow model inference on the future of decentralized and edge model inference?As demand rises, what will the latency/throughput tradeoff look like?What innovations on the horizon might massively speed up model inference?
NOW PLAYING
LLM Inference Speed (Tech Deep Dive)
No transcript for this episode yet
Similar Episodes
Mar 31, 2026 ·54m
Mar 27, 2026 ·14m
Mar 24, 2026 ·42m
Mar 20, 2026 ·42m
Mar 17, 2026 ·41m
Mar 13, 2026 ·44m