EPISODE · Apr 18, 2024 · 13 MIN
#80- Layer pruning and Mixture of Depths.
from Life with AI · host Filipe Lauar
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs. I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance. I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM. Paper MoD: https://arxiv.org/pdf/2404.02258.pdf Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
What this episode covers
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs. I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance. I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM. Paper MoD: https://arxiv.org/pdf/2404.02258.pdf Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
NOW PLAYING
#80- Layer pruning and Mixture of Depths.
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m