EPISODE · Nov 27, 2019 · 37 MIN
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
from Data Science at Home · host Francesco Gadaleta <frag>
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.Don't forget to subscribe to our Newsletter or join the discussion on our Discord server ReferencesAttention is all you need https://arxiv.org/abs/1706.03762The illustrated transformer https://jalammar.github.io/illustrated-transformerSelf-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit datascienceathome.substack.com
NOW PLAYING
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
No transcript for this episode yet
Similar Episodes
Apr 20, 2026 ·75m
Apr 16, 2026 ·84m
Apr 13, 2026 ·79m
Apr 6, 2026 ·116m
Mar 30, 2026 ·126m
Mar 27, 2026 ·17m