EPISODE · Sep 26, 2025 · 1H 16M
Attention is all you need (Vaswani et al 2017) - Weekend Classics
from Revise and Resubmit - The Mayukh Show · host Mayukh Mukhopadhyay
English Podcast starts at 00:00:00Bengali Podcast Starts at 00:31:44Hindi Podcast Starts at 00:59:01ReferenceAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349Youtube channel link https://www.youtube.com/@weekendresearcherConnect on linkedinhttps://www.linkedin.com/in/mayukhpsm/🎙️ Welcome to Revise and Resubmit, and welcome to Weekend Classics! ✨I am your host, and tonight I am holding a paper that changed how our machines learn to listen. 📚🤖We used to stack recurrence. We used to roll convolutions. We stitched encoders to decoders and prayed attention would bridge the gap. Then this arrived like a clean sheet of paper and a bright new pen. It said something bold. It said something simple. It said, Attention Is All You Need. 🌟Written by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, this classic appeared at the 31st International Conference on Neural Information Processing Systems. Proceedings published on 04 December 2017. Published by Curran Associates Inc., distributed by the Association for Computing Machinery. 📅🏛️Here is the heartbeat. No recurrence. No convolutions. Just attention. Self-attention that sees the whole sequence at once. Multi-head attention that looks in many directions at the same time. Positional encodings that give order a voice. An encoder that learns context. A decoder that speaks it back. It is simple to read. It is powerful to run. It is fast to train. ⚡️🧠Proof matters. Numbers sing.On WMT14 English to German, the model hits 28.4 BLEU, beating prior best results by over 2 BLEU. On English to French, a single model reaches 41.0 BLEU after 3.5 days on eight GPUs. Better quality. More parallelism. Less time. More progress. 🚀📈This is how a field turns. A sharper tool. A clearer map. A shorter path from idea to impact. And suddenly translation, vision, audio, code, even protein folding start to sound like one language with many dialects. 🧩🌍Thank you to the authors and to the publisher, Curran Associates Inc., with distribution by ACM. 🙏If you enjoyed this Weekend Classics, subscribe on Spotify and on our YouTube channel Weekend Researcher. We are also on Amazon Prime Music and Apple Podcast. 🎧🔔📺So tell me, if attention is all we need, what might we discover when we finally give it fully to the next problem that scares us most? 🤔✨
What this episode covers
English Podcast starts at 00:00:00Bengali Podcast Starts at 00:31:44Hindi Podcast Starts at 00:59:01ReferenceAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349Youtube channel link https://www.youtube.com/@weekendresearcherConnect on linkedinhttps://www.linkedin.com/in/mayukhpsm/🎙️ Welcome to Revise and Resubmit, and welcome to Weekend Classics! ✨I am your host, and tonight I am holding a paper that changed how our machines learn to listen. 📚🤖We used to stack recurrence. We used to roll convolutions. We stitched encoders to decoders and prayed attention would bridge the gap. Then this arrived like a clean sheet of paper and a bright new pen. It said something bold. It said something simple. It said, Attention Is All You Need. 🌟Written by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, this classic appeared at the 31st International Conference on Neural Information Processing Systems. Proceedings published on 04 December 2017. Published by Curran Associates Inc., distributed by the Association for Computing Machinery. 📅🏛️Here is the heartbeat. No recurrence. No convolutions. Just attention. Self-attention that sees the whole sequence at once. Multi-head attention that looks in many directions at the same time. Positional encodings that give order a voice. An encoder that learns context. A decoder that speaks it back. It is simple to read. It is powerful to run. It is fast to train. ⚡️🧠Proof matters. Numbers sing.On WMT14 English to German, the model hits 28.4 BLEU, beating prior best results by over 2 BLEU. On English to French, a single model reaches 41.0 BLEU after 3.5 days on eight GPUs. Better quality. More parallelism. Less time. More progress. 🚀📈This is how a field turns. A sharper tool. A clearer map. A shorter path from idea to impact. And suddenly translation, vision, audio, code, even protein folding start to sound like one language with many dialects. 🧩🌍Thank you to the authors and to the publisher, Curran Associates Inc., with distribution by ACM. 🙏If you enjoyed this Weekend Classics, subscribe on Spotify and on our YouTube channel Weekend Researcher. We are also on Amazon Prime Music and Apple Podcast. 🎧🔔📺So tell me, if attention is all we need, what might we discover when we finally give it fully to the next problem that scares us most? 🤔✨
NOW PLAYING
Attention is all you need (Vaswani et al 2017) - Weekend Classics
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m