EPISODE · Jun 7, 2025 · 19 MIN
39: Scaling whole-genome polygenic scores with VIPRS
from Base by Base · host Gustavo Barra
Zabad S et al., The American Journal of Human Genetics - This episode covers Zabad et al.'s methods to scale summary-statistics-based polygenic risk score (PRS) inference to millions of variants. The authors introduce compressed LD storage, memory-efficient coordinate-ascent variational algorithms, and multi-level parallelism to cut storage, runtime, and RAM by orders of magnitude while retaining competitive prediction accuracy. Key terms: polygenic risk scores, linkage disequilibrium, variational inference, LD compression, VIPRS. Study Highlights:The authors design a compact LD-matrix format (CSR stored in Zarr with quantization) and algorithmic optimizations that reduce LD storage by over 50-fold. They reimplement coordinate-ascent variational updates in C/C++ using single-precision floats, triangular-LD updates, dequantize-on-the-fly, and two layers of parallelism to cut runtime and memory use by orders of magnitude. VIPRS v0.1 can run variational Bayesian regression on 1.1M HapMap3 variants in under a minute and converges genome wide on up to 18M variants in tens of minutes using <15 GB RAM. The paper also analyzes spectral causes of numerical instability in LD matrices and gives practical recommendations to improve stability and prediction accuracy. Conclusion:The updated VIPRS toolkit enables fast, memory-efficient whole-genome PRS inference at biobank scale with competitive accuracy and provides storage formats and numerical safeguards to improve reproducibility and portability. Music:Enjoy the music based on this article at the end of the episode. Article title:Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms First author:Zabad S Journal:The American Journal of Human Genetics DOI:10.1016/j.ajhg.2025.05.002 Reference:Zabad S., Haryan C.A., Gravel S., Misra S., Li Y. (2025). Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms. The American Journal of Human Genetics 112, 1–19. https://doi.org/10.1016/j.ajhg.2025.05.002 License:This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support:Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/viprs-whole-genome-prs QC:This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-06-07. QC Scope:- article metadata and core scientific claims from the narration- excludes analogies, intro/outro, and music- transcript coverage: Audited the transcript's presentation of VIPRS architecture (LD storage, quantization, DQF, triangular LD), memory/performance benchmarks, parallelism, numerical stability guards, and cross-ancestry/cross-biobank findings against the original article.- transcript topics: Polygenic risk scores and LD challenges; LD matrix compression via upper-triangular storage; CSR storage and Zarr cloud-native format; Quantization to int8/int16 and scale quantization; Dequantize-on-the-Fly (DQF) memory management; Coordinate ascent updates and OpenMP parallelism QC Summary:- factual score: 10/10- metadata score: 10/10- supported core claims: 8- claims flagged for review: 0- metadata checks passed: 4- metadata issues found: 0 Metadata Audited:- article_doi- article_title- article_journal- license Factual Items Audited:- LD matrix compression reduces storage by >50-fold...
What this episode covers
Zabad S et al., The American Journal of Human Genetics - This episode covers Zabad et al.'s methods to scale summary-statistics-based polygenic risk score (PRS) inference to millions of variants. The authors introduce compressed LD storage, memory-efficient coordinate-ascent variational algorithms, and multi-level parallelism to cut storage, runtime, and RAM by orders of magnitude while retaining competitive prediction accuracy. Key terms: polygenic risk scores, linkage disequilibrium, variational inference, LD compression, VIPRS. Study Highlights:The authors design a compact LD-matrix format (CSR stored in Zarr with quantization) and algorithmic optimizations that reduce LD storage by over 50-fold. They reimplement coordinate-ascent variational updates in C/C++ using single-precision floats, triangular-LD updates, dequantize-on-the-fly, and two layers of parallelism to cut runtime and memory use by orders of magnitude. VIPRS v0.1 can run variational Bayesian regression on 1.1M HapMap3 variants in under a minute and converges genome wide on up to 18M variants in tens of minutes using 50-fold...
NOW PLAYING
39: Scaling whole-genome polygenic scores with VIPRS
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m