#80- Layer pruning and Mixture of Depths.
An episode of the Life with AI podcast, hosted by Filipe Lauar, titled "#80- Layer pruning and Mixture of Depths." was published on April 18, 2024 and runs 13 minutes.
April 18, 2024 ·13m · Life with AI
Summary
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs. I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance. I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM. Paper MoD: https://arxiv.org/pdf/2404.02258.pdf Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
Episode Description
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.
Paper MoD: https://arxiv.org/pdf/2404.02258.pdf
Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
Similar Episodes
Jan 15, 2025 ·15m
Jan 15, 2025 ·18m
Dec 8, 2023 ·20m
Oct 25, 2023 ·18m
Oct 21, 2023 ·19m
Sep 16, 2023 ·18m