293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77) episode artwork

EPISODE · Feb 17, 2026 · 17 MIN

293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77)

from Base by Base · host Gustavo Barra

Gracia Carmona O et al., Patterns, 7 (2026) 101425. doi:10.1016/j.patter.2025.101425 - IndeLLM uses protein language models (ESM2) to score in-frame indels and a compact Siamese transfer-learning model that achieves state-of-the-art pathogenicity prediction with MCC = 0.77. Key terms: IndeLLM, protein language models, in-frame indels, Siamese network, ESM2. Study Highlights:Using human protein sequences and ESM2 embeddings, the authors develop IndeLLM, a zero-shot scoring function that sums overlapping-region probabilities to correct length bias in in-frame indels. They train a compact Siamese one-hidden-layer network on PLM embeddings with biologically guided embedding splitting and achieve MCC = 0.77 on the test set. Per-residue probability differences mapped onto structures (FGFR1, GLMN) identify local regions affected by indels and improve interpretability. The framework reduces insertion false negatives and is released with Colab and GitHub tools for indel annotation and disease-variant analysis. Conclusion:IndeLLM zero-shot scoring and a small Siamese transfer-learning model provide improved, interpretable indel pathogenicity prediction, with the Siamese model achieving MCC = 0.77. Music:Enjoy the music based on this article at the end of the episode. Article title:Leveraging protein language models and a scoring function for indel characterization and transfer learning First author:Gracia Carmona O Journal:Patterns, 7 (2026) 101425. doi:10.1016/j.patter.2025.101425 DOI:10.1016/j.patter.2025.101425 Reference:Gracia Carmona O, Leipart V, Amdam GV, Orengo C, Fraternali F. Leveraging protein language models and a scoring function for indel characterization and transfer learning. Patterns. 7 (2026) 101425. https://doi.org/10.1016/j.patter.2025.101425 License:This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/ Support:Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/indellm-indel-siamese-model QC:This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-02-17. QC Scope:- article metadata and core scientific claims from the narration- excludes analogies, intro/outro, and music- transcript coverage: Audited the scientific content conveyed in the transcript: indel biology, IndeLLM zero-shot scoring, Siamese Model 4, performance metrics, interpretability via structure-mapped probability changes, structural validation with AlphaFold, and broad applicability including non-human systems and accessible tooling.- transcript topics: Indel biology: in-frame indels vs frameshift indels; Protein language models and length bias; Indel scoring: IndeLLM zero-shot (overlapping regions); Probability scoring math: sum vs log-sum; Siamese network (Model 4) and transfer learning; Performance metrics: MCC 0.65 (zero-shot) and 0.77 (Siamese); comparison to Provean QC Summary:- factual score: 10/10- metadata score: 10/10- supported core claims: 8- claims flagged for review: 0- metadata checks passed: 4- metadata issues found: 0 Metadata Audited:- article_doi- article_title- article_journal- license Factual Items Audited:- IndeLLM zero-shot scoring uses overlapping regions to correct length bias- Switch from log probability sums to sum of probabilities to reduce noise- Model 4 Siamese network with embedding split...

Gracia Carmona O et al., Patterns, 7 (2026) 101425. doi:10.1016/j.patter.2025.101425 - IndeLLM uses protein language models (ESM2) to score in-frame indels and a compact Siamese transfer-learning model that achieves state-of-the-art pathogenicity prediction with MCC = 0.77. Key terms: IndeLLM, protein language models, in-frame indels, Siamese network, ESM2. Study Highlights:Using human protein sequences and ESM2 embeddings, the authors develop IndeLLM, a zero-shot scoring function that sums overlapping-region probabilities to correct length bias in in-frame indels. They train a compact Siamese one-hidden-layer network on PLM embeddings with biologically guided embedding splitting and achieve MCC = 0.77 on the test set. Per-residue probability differences mapped onto structures (FGFR1, GLMN) identify local regions affected by indels and improve interpretability. The framework reduces insertion false negatives and is released with Colab and GitHub tools for indel annotation and disease-variant analysis. Conclusion:IndeLLM zero-shot scoring and a small Siamese transfer-learning model provide improved, interpretable indel pathogenicity prediction, with the Siamese model achieving MCC = 0.77. Music:Enjoy the music based on this article at the end of the episode. Article title:Leveraging protein language models and a scoring function for indel characterization and transfer learning First author:Gracia Carmona O Journal:Patterns, 7 (2026) 101425. doi:10.1016/j.patter.2025.101425 DOI:10.1016/j.patter.2025.101425 Reference:Gracia Carmona O, Leipart V, Amdam GV, Orengo C, Fraternali F. Leveraging protein language models and a scoring function for indel characterization and transfer learning. Patterns. 7 (2026) 101425. https://doi.org/10.1016/j.patter.2025.101425 License:This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/ Support:Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/indellm-indel-siamese-model QC:This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-02-17. QC Scope:- article metadata and core scientific claims from the narration- excludes analogies, intro/outro, and music- transcript coverage: Audited the scientific content conveyed in the transcript: indel biology, IndeLLM zero-shot scoring, Siamese Model 4, performance metrics, interpretability via structure-mapped probability changes, structural validation with AlphaFold, and broad applicability including non-human systems and accessible tooling.- transcript topics: Indel biology: in-frame indels vs frameshift indels; Protein language models and length bias; Indel scoring: IndeLLM zero-shot (overlapping regions); Probability scoring math: sum vs log-sum; Siamese network (Model 4) and transfer learning; Performance metrics: MCC 0.65 (zero-shot) and 0.77 (Siamese); comparison to Provean QC Summary:- factual score: 10/10- metadata score: 10/10- supported core claims: 8- claims flagged for review: 0- metadata checks passed: 4- metadata issues found: 0 Metadata Audited:- article_doi- article_title- article_journal- license Factual Items Audited:- IndeLLM zero-shot scoring uses overlapping regions to correct length bias- Switch from log probability sums to sum of probabilities to reduce noise- Model 4 Siamese network with embedding split...

NOW PLAYING

293: IndeLLM (ESM2) zero-shot scoring and Siamese transfer learning for in-frame indel prediction (MCC 0.77)

0:00 17:53

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

MG Show MG Show The MG Show, hosted by Jeffrey Pedersen and Shannon Townsend, is a leading alternative media platform dedicated to uncovering the truth behind today’s most pressing political issues. Launched in 2019, the show has grown exponentially, offering unfiltered insights, comprehensive research, and real-time analysis. With a commitment to independent journalism and factual integrity, the MG Show empowers its audience with knowledge and encourages active participation in the political discourse. That Hoarder: Overcome Compulsive Hoarding That Hoarder Hoarding disorder is stigmatised and people who hoard feel vast amounts of shame. This podcast began life as an audio diary, an anonymous outlet for somebody with this weird condition. That Hoarder speaks about her experiences living with compulsive hoarding, she interviews therapists, academics, researchers, children of hoarders, professional organisers and influencers, and she shares insight and tips for others with the problem. Listened to by people who hoard as well as those who love them and those who work with them, Overcome Compulsive Hoarding with That Hoarder aims to shatter the stigma, share the truth and speak openly and honestly to improve lives. Flottengeflüster ALD Automotive Österreich | LeasePlan Beim Flottengeflüster powered by ALD Automotive | LeasePlan präsentieren Jörg Janik und Peter Gutenbrunner alle zwei Wochen spannende Informationen rund um das Thema nachhaltige Mobilität. Beide beschäftigen sich schon lange mit der Thematik und bringen umfangreiches Fachwissen mit. Sollten sie aber doch einmal nicht weiter wissen, werden unsere Expert*innen hinzugezogen, die ihnen gerne mit Rat und Tat zur Seite stehen. The Small Business Startup School – Business Notes | Financial Literacy | Retail Psychology – For Professionals & Entrepreneurs The Small Business Startup School Inc. Starting or buying a small business? While personal circumstances may vary, business patterns remain timeless. On The Small Business Startup School, we explore strategies, insights, and practical solutions to help entrepreneurs confidently navigate their journey.Hosted by Ola Williams—a retail entrepreneur, fintech founder, and financial coach with over two decades of experience—this podcast marries financial awareness and retail psychology with optimism to deliver actionable takeaways.Join us to learn, grow, and connect as we uncover the keys to business success.Let’s continue to learn together and be encouraged to keep on connecting!

Frequently Asked Questions

How long is this episode of Base by Base?

This episode is 17 minutes long.

When was this Base by Base episode published?

This episode was published on February 17, 2026.

What is this episode about?

Gracia Carmona O et al., Patterns, 7 (2026) 101425. doi:10.1016/j.patter.2025.101425 - IndeLLM uses protein language models (ESM2) to score in-frame indels and a compact Siamese transfer-learning model that achieves state-of-the-art pathogenicity...

Can I download this Base by Base episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!