EPISODE · Jun 30, 2025 · 17 MIN
60: Epi-PRS: Genomic LLMs and imputed epigenomics boost polygenic prediction
from Base by Base · host Gustavo Barra
Zeng W et al., PNAS - This paper introduces Epi-PRS, a workflow that uses genomic large language models to impute cell-type-specific epigenomic features from diploid genotypes and trains nonlinear risk models to improve polygenic prediction from WGS. The method improves AUC for breast cancer and type 2 diabetes in UK Biobank and shows gains from modeling regulatory context and rare variants. Key terms: Epi-PRS, genomic LLM, polygenic risk score, epigenomics, whole-genome sequencing. Study Highlights:The authors developed Epi-PRS which uses a genomic LLM (Enformer) to predict epigenomic signals from phased maternal and paternal sequences and uses local PCA plus GBRT to predict disease risk. Simulation studies show nonlinear models and epigenomic intermediates recover signal missed by linear PRS methods, especially when epigenetic effects or rare variants contribute. In UK Biobank case-control tests for breast cancer and T2D, Epi-PRS outperformed LDpred2 and PRS-CS using selected LD blocks. Tissue-specific feature selection and bin-level importance analyses linked top features to relevant regulatory marks in pancreas, liver, and blood. Conclusion:Integrating LLM-imputed, cell-type-specific epigenomic features with nonlinear modeling improves polygenic risk prediction from WGS, particularly when regulatory effects and rare variants are important, and offers interpretable links to disease-relevant regulatory elements. Music:Enjoy the music based on this article at the end of the episode. Article title:Improving polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features First author:Zeng W Journal:PNAS DOI:10.1073/pnas.2419202122 Reference:Zeng W., Guo H., Liu Q., Wong W.H. Improving polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features. PNAS. 2025;122(24):e2419202122. doi:10.1073/pnas.2419202122 License:This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support:Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/epi-prs-llm-imputed-epigenomics QC:This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-06-30. QC Scope:- article metadata and core scientific claims from the narration- excludes analogies, intro/outro, and music- transcript coverage: Audited transcript segments covering Epi-PRS concept, three-stage workflow, epigenomic feature inference via an LLM, dimension reduction, nonlinear modeling, simulation results, UKBB findings for breast cancer and T2D, tissue-enrichment interpretations, and discussion of limitations/future directions.- transcript topics: Epi-PRS concept and goals; Personal genome construction (maternal/paternal genomes); Epigenomic feature extraction with Enformer; Local PCA dimension reduction; GBRT risk prediction; Nonlinear vs linear PRS in simulations QC Summary:- factual score: 10/10- metadata score: 10/10- supported core claims: 6- claims flagged for review: 0- metadata checks passed: 4- metadata issues found: 0 Metadata Audited:- article_doi- article_title- article_journal- license Factual Items Audited:- Epi-PRS uses a genomic large language model to impute cell-type-specific epigenomic signals from personal diploid genomes as intermediaries between genotype and phenotype.- Epi-PRS w...
What this episode covers
Zeng W et al., PNAS - This paper introduces Epi-PRS, a workflow that uses genomic large language models to impute cell-type-specific epigenomic features from diploid genotypes and trains nonlinear risk models to improve polygenic prediction from WGS. The method improves AUC for breast cancer and type 2 diabetes in UK Biobank and shows gains from modeling regulatory context and rare variants. Key terms: Epi-PRS, genomic LLM, polygenic risk score, epigenomics, whole-genome sequencing. Study Highlights:The authors developed Epi-PRS which uses a genomic LLM (Enformer) to predict epigenomic signals from phased maternal and paternal sequences and uses local PCA plus GBRT to predict disease risk. Simulation studies show nonlinear models and epigenomic intermediates recover signal missed by linear PRS methods, especially when epigenetic effects or rare variants contribute. In UK Biobank case-control tests for breast cancer and T2D, Epi-PRS outperformed LDpred2 and PRS-CS using selected LD blocks. Tissue-specific feature selection and bin-level importance analyses linked top features to relevant regulatory marks in pancreas, liver, and blood. Conclusion:Integrating LLM-imputed, cell-type-specific epigenomic features with nonlinear modeling improves polygenic risk prediction from WGS, particularly when regulatory effects and rare variants are important, and offers interpretable links to disease-relevant regulatory elements. Music:Enjoy the music based on this article at the end of the episode. Article title:Improving polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features First author:Zeng W Journal:PNAS DOI:10.1073/pnas.2419202122 Reference:Zeng W., Guo H., Liu Q., Wong W.H. Improving polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features. PNAS. 2025;122(24):e2419202122. doi:10.1073/pnas.2419202122 License:This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support:Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/epi-prs-llm-imputed-epigenomics QC:This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-06-30. QC Scope:- article metadata and core scientific claims from the narration- excludes analogies, intro/outro, and music- transcript coverage: Audited transcript segments covering Epi-PRS concept, three-stage workflow, epigenomic feature inference via an LLM, dimension reduction, nonlinear modeling, simulation results, UKBB findings for breast cancer and T2D, tissue-enrichment interpretations, and discussion of limitations/future directions.- transcript topics: Epi-PRS concept and goals; Personal genome construction (maternal/paternal genomes); Epigenomic feature extraction with Enformer; Local PCA dimension reduction; GBRT risk prediction; Nonlinear vs linear PRS in simulations QC Summary:- factual score: 10/10- metadata score: 10/10- supported core claims: 6- claims flagged for review: 0- metadata checks passed: 4- metadata issues found: 0 Metadata Audited:- article_doi- article_title- article_journal- license Factual Items Audited:- Epi-PRS uses a genomic large language model to impute cell-type-specific epigenomic signals from personal diploid genomes as intermediaries between genotype and phenotype.- Epi-PRS w...
NOW PLAYING
60: Epi-PRS: Genomic LLMs and imputed epigenomics boost polygenic prediction
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m