[초청강연] Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects
Date: 2023-01-03 14:00 ~ 16:00
Speaker: Yun S. Song (UC Berkeley Dept. of Statistics & EECS)
Professor: 생명과학부
Location: 대면 | 교수회의실(504-105)
variant effects
Yun S. Song, Department of Statistics & EECS, UC Berkeley
Genetic variation in the human genome is a major determinant of individual disease risk,
but the vast majority of missense variants have unknown etiological effects. Various
computational strategies have been proposed to predict the effects of missense variants
across the human proteome, using many different predictive signals. Here, we present a
robust learning framework for leveraging functional assay data to construct
computational predictors of disease variant effects. We train cross-protein transfer (CPT)
models using deep mutational scanning data from only five proteins and achieve state-
of-the-art performance on unseen proteins across the human proteome. On human
disease variants annotated in ClinVar, our model CPT-1 improves specificity at 95%
sensitivity to 64%, from 31% for ESM-1v and 50% for EVE. Our framework combines
general protein sequence models with vertebrate sequence alignments and AlphaFold2
structures, and it is adaptable to the future inclusion of other sources of information. We
release predictions for all missense variants in 90% of human genes. Our results
establish the utility of functional assay data for learning general properties of variants
that can transfer to unseen proteins.