LLM Inference Speed (Tech Deep Dive)
An episode of the Thinking Machines: AI & Philosophy podcast, hosted by Daniel Reid Cahn, titled "LLM Inference Speed (Tech Deep Dive)" was published on October 6, 2023 and runs 39 minutes.
October 6, 2023 ·39m · Thinking Machines: AI & Philosophy
Summary
In this tech talk, we dive deep into the technical specifics around LLM inference.The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?We jump into:Is fast model inference the real moat for LLM companies?What are the implications of slow model inference on the future of decentralized and edge model inference?As demand rises, what will the latency/throughput tradeoff look like?What innovations on the horizon might massively speed up model inference?
Episode Description
In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into:
- Is fast model inference the real moat for LLM companies?
- What are the implications of slow model inference on the future of decentralized and edge model inference?
- As demand rises, what will the latency/throughput tradeoff look like?
- What innovations on the horizon might massively speed up model inference?
Similar Episodes
Apr 9, 2026 ·28m
Apr 7, 2026 ·27m
Apr 3, 2026 ·21m
Mar 30, 2026 ·20m