How large language models work, a visual intro to transformers

from Youtube DeepDive · host Alan Shore and Denise

The inner workings of large language models (LLMs) like ChatGPT, focusing on the transformer architecture. The speaker starts by defining what LLMs are and how they use pre-trained transformers to generate text. The main focus is on the attention mechanism, which allows LLMs to learn the relationship between words in a sentence and understand their context. The video uses a visual approach and provides simple analogies to explain complex concepts. It also briefly discusses the embedding process, which translates words into numerical representations, and the softmax function, which normalizes these representations into probability distributions.Become a supporter of this podcast: https://www.spreaker.com/podcast/youtube-deepdive--6348983/support.

What this episode covers

NOW PLAYING

0:00 15:32

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

No similar episodes found.

Similar Podcasts

No similar podcasts found.

Frequently Asked Questions

How long is this episode of Youtube DeepDive?

This episode is 15 minutes long.

When was this Youtube DeepDive episode published?

This episode was published on October 27, 2024.

What is this episode about?

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Youtube DeepDive episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.

URL copied to clipboard!