learning with yacine show Podcast - All Episodes

2

Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)

I had an awesome time interviewing @IdanShenfeld and @jonashubotter from MIT and ETH Zurich about self-distillation.this very promising post-training paradigm where the model acts as its own teacher by conditioning on environment feedback or demonstrations.we cover the SDPO algo for reinforcement learning with rich feedback and SDFT for continual learning without forgetting along with many applications.we dig into how it works, why it’s simpler and faster than GRPO, and where this is already showing up in production systems.table of content:0:00 - what is self distillation2:50 - idan (MIT) and jonas (ETH Zurich) introduction and motivation18:40 - different perspective of on-policy self-distillation (presentation)36:00 - metacognition and specificity in self-distillation37:24 - very long hard task and self-distillation42:00 - continual learning with self-distillation (presentation)1:16:50 - what is next in this research direction?1:20:00 - is there any experience with subjective feedbacks?1:22:50 - quality vs number of feedbacks?1:26:40 - what setting would self-distillation struggle vs GRPO?my random thoughts on the paradigmI think this is it. I think this will have the same impact as CoT had when it was first introduced and I would be very surprised given it’s strong performance against SFT that it is not already weaved into the main closed source models.It also is becoming clearer to me that the traditional very rigid boundary between pre-mid-post training are starting to collapse a bit and that the reality is more of a mix that is very dependent on the expected performance of the model.on a more meta note, I think having the model being it’s own teacher make just so much sense. like I was looking at my kids and I realized that each one of them has this bias of looking at the next in term of age for clues about how to do things.they have this innate fascination about HOW the one that is a bit older is doing things, not even the eldest. and I don’t think it’s just a fun quirk of nature.the “policy” of the next in line kid is technically very close to whatever the base kid is at the moment. yes they can learn from me or my wife, but we are so much advanced that it’s a bit hard for them to understand how we are doing things (even though the movement are more precise).it’s much easier to copy the one that is a bit older even with a slightly flawed policy and listen to them if they give feedback because they can just understand them better.it reminded me of the example idan gave where if a smaller model was just listening to the big teacher model in robotics they will just hit point in the data they can’t recover from because they are going to make mistakes that are just not possible for the larger model.anyway, this paradigm has a beautiful kernel of truth in term that is fundamental in my view and is a very exciting angle to get up to speed with!papers the slides were super crisp really cool of them to share!if you are interested in digging more into the literature here is a few papers that are worth checking out:📌 Reinforcement Learning via Self-Distillation (SDPO): :https://arxiv.org/abs/2601.20802📌 Self-Distillation Enables Continual Learning (SDFT): https://arxiv.org/abs/2601.19897📌 Aligning Language Models from User Interactions: https://arxiv.org/abs/2603.12273📌 RL’s Razor: Why Online Reinforcement Learning Forgets Less: https://arxiv.org/abs/2509.04259📌 Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs: https://arxiv.org/abs/2410.08020enjoy my guys 🌹 Get full access to learning with yacine at www.yacinemahdid.com/subscribe

Apr 28, 2026

1h 31m
1

The Science of Learning Math (and Anything Else) with Justin Skycak

hey folks first time posting this sort of interview as a podcast (shout out the three anon on twitter that requested it).now that I think of it the format does make sense since me and the interviewee are mostly chatting and exchanging ideas!will be posting more of these in the future!in this episode I had the pleasure of spending not 1 but 2 days interviewing the learning legend justin skycak.we've talked about his quite impressive self-learning journey (3000h of math in high school) all the way to how he hand curated the initial knowledge graph for math academy to make that process more efficient.I like justin because I understand his inner desire for learning.I had the exact same “awakening” let’s say in my tiny bedroom when I was a teenager. I realized while doing some homework that I loved learning and found an immense sense of peace within it.this realization kind of got lost in the midst of winter depression for a while until it got awoken again with a burning flare that never died down since (more trauma dump here).this understanding that there is a meta-learning-skill is so fun because then the whole world kind of open up for you.you encounter a hard problem you don’t know how to even get started? → great let’s use “learning”.you got an opportunity that you are under prepared for? → great let’s use “learning”.you want to make something seemingly impossible happen? → great let’s use “learning”.what I like about justin’s story is that it is very concrete and touch math which has far reaching usefulness in life.so here is the very lively 3h discussion where we touch these topics:0:00:00 - intro: 0:02:10 - justin background 0:05:45 - 3000h math self study in high school0:11:45 - what a day looked like for that 3000h stretch0:16:10 - meta-learning vs pure math learning0:21:50 - when did you get into cognitive neuro?0:29:55 - how did the fundamental math helped in your research projects0:43:10 - what does the math academy learning system looks like0:47:34 - how did you guys build the 2000 topic knowledge graph1:01:15 - would LLM be useful as an interface to that knowledge graph for the students?1:10:46 - how does the FIRe spaced repetition algorithm works?1:17:34 - does the same knowledge graph structure would work for physics? or other topic?:1:34:05 - how do you understand the subject vs the curiculum1:35:50 - is there a connection between studying math and learning a sport?1:42:00 - do you think in math doing and teaching requires different skills?1:56:25 - could you get understanding without automaticy?2:05:35 - do you see any upside of confusion in learning?2:14:11 - learning math as an adult?2:19:20 - how to fill the motivation gap after learning the fundamental?2:24:10 - how should teaching math for kids and adults balance fundamentals and creativity?2:33:55 - is it ever too late to learn math seriously?2:46:00 - mastery learning vs ultra learning2:51:30 - top-down vs bottom-up2:53:40 - mastery learning for domain without a structured hierarchical structure?2:56:30 - neurodivergence / adhd for structured math learning?3:06:20 - amateur mathematician augmented with technology will be able to contribute to research?3:14:37 - what are you most excited about right now in term of learningenjoy my guys! :) Get full access to learning with yacine at www.yacinemahdid.com/subscribe

Apr 15, 2026

3h 18m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

a show about untangling the complexity of deep learning and the brain by interviewing smart researchers and builders in the field of intelligence! ideal for scholars, industry practitioners or toddlers. www.yacinemahdid.com

HOSTED BY

yacine mahdid

Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)

The Science of Learning Math (and Anything Else) with Justin Skycak

Authentication Required