TY - JOUR
T1 - Predicting amino acid substitution probabilities using single nucleotide polymorphisms
AU - Rizzato, Francesca
AU - Rodriguez, Alex
AU - Biarnés, Xevi
AU - Laio, Alessandro
N1 - Funding Information:
We would like to acknowledge fruitful discussions with Edoardo Sarti, Stefano Zamuner, and Michele Allegra. This work was supported by Associazione Italiana per la Ricerca sul Cancro 5 per mille (grant 12214 to A.R. and A.L.).
Publisher Copyright:
© 2017 by the Genetics Society of America
PY - 2017/10
Y1 - 2017/10
N2 - Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85–100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
AB - Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85–100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
KW - Protein sequence alignment
KW - Protein sequence evolution
KW - SNP
KW - Substitution matrices
KW - Substitution rate variability
UR - http://www.scopus.com/inward/record.url?scp=85030710404&partnerID=8YFLogxK
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pure_univeritat_ramon_llull&SrcAuth=WosAPI&KeyUT=WOS:000412232600018&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1534/genetics.117.300078
DO - 10.1534/genetics.117.300078
M3 - Article
C2 - 28754661
AN - SCOPUS:85030710404
SN - 0016-6731
VL - 207
SP - 643
EP - 652
JO - Genetics
JF - Genetics
IS - 2
ER -