EPISODE · Oct 27, 2024 · 15 MIN
How large language models work, a visual intro to transformers
from Youtube DeepDive · host Alan Shore and Denise
The inner workings of large language models (LLMs) like ChatGPT, focusing on the transformer architecture. The speaker starts by defining what LLMs are and how they use pre-trained transformers to generate text. The main focus is on the attention mechanism, which allows LLMs to learn the relationship between words in a sentence and understand their context. The video uses a visual approach and provides simple analogies to explain complex concepts. It also briefly discusses the embedding process, which translates words into numerical representations, and the softmax function, which normalizes these representations into probability distributions.Become a supporter of this podcast: https://www.spreaker.com/podcast/youtube-deepdive--6348983/support.
NOW PLAYING
How large language models work, a visual intro to transformers
No transcript for this episode yet
Similar Episodes
Mar 2, 2026 ·27m
Mar 2, 2026 ·8m
Feb 25, 2026 ·14m
Jan 25, 2026 ·13m
Jan 14, 2026 ·32m