More powerful deep learning with transformers (Ep. 84)
Episode 80 of the Data Science at Home podcast, hosted by Francesco Gadaleta, titled "More powerful deep learning with transformers (Ep. 84)" was published on October 27, 2019 and runs 37 minutes.
October 27, 2019 ·37m · Data Science at Home
Summary
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful. Don't forget to subscribe to our Newsletter or join the discussion on our Discord server References Attention is all you need https://arxiv.org/abs/1706.03762 The illustrated transformer https://jalammar.github.io/illustrated-transformer Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
Episode Description
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
References
- Attention is all you need https://arxiv.org/abs/1706.03762
- The illustrated transformer https://jalammar.github.io/illustrated-transformer
- Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
Similar Episodes
Apr 13, 2026 ·4m
Apr 12, 2026 ·5m
Apr 11, 2026 ·5m
Apr 10, 2026 ·4m
Apr 9, 2026 ·3m
Apr 8, 2026 ·3m