Font Size: a A A

Prediction Of Deleterious Synonymous Variants In Human Genomes

Posted on:2019-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:2310330545461711Subject:Biology
Abstract/Summary:PDF Full Text Request
With the rapid development of sequencing and various technologies,more and more single nucleotide variants were generated in biology,including synonymous variants.While synonymous variants have largely been unstudied,since they do not alter protein sequences,mounting evidence suggests that they affect different aspects of organism function and play an important role in human disease.Distinguishing pathogenic synonymous single nucleotide variants(sSNVs)from neutral ones can significantly improve our ability of selecting functional genetic variants from various genome sequencing projects,and therefore understanding of disease etiology.Predictive tools built using bioinformatics methods can help us quickly find these potentially deleterious sSNVs.We investigated deleterious sSNVs in the human genome reported in the literature,their pathogenic mechanisms are well understood,including evolutionary conservation of mutation sites,disruption of splicing due to sSNVs,synonymous codon usage,sequence features,changes of RNA stability,increase or decrease of translation efficiency,and so on.The pathogenesis of sSNVs,on the one hand,they can be used as features of prediction model construction;on the other hand,they also provide a new perspective for the diagnosis and treatment of related diseases.At present,there are few studies on the prediction of the harmfulness of sSNVs.In addition,recently developed methods have some shortcomings,such as building a model with small sample and incomplete features.In order to solve these problems,we describe a feature based computational method named IDSV(Identification of Deleterious Synonymous Variants)to detect deleterious sSNVs in human genomes.Firstly,we obtain reliable sSNVs from the dbDSM,VariSNP and ClinVar,and systematically investigate a total of 74 features across seven categories:splicing,conservation,codon usage,sequence,pre-mRNA folding energy,translation efficiency,and function regions annotation features.Then,to remove redundant and irrelevant features and improve the prediction performance,feature selection is employed using the sequential backward selection.Based on the optimized 10 features,a random forest classifier is developed to identify deleterious sSNVs.The results on benchmark datasets show that the proposed method outperforms other state-of-the-art methods in identifing sSNVs that are pathogenic.Our results indicate that besides splicing and conservation features,a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs.While the function regions annotation and sequence features are weakly informative,they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features.In addition,we also proved that different sources of benign sSNVs in the training set and increasing the size of the benign sSNVs in the training data set has little effect on the prediction performance,which further proves that our model is robust.Then we analyze and discuss shortcomings of the current tools,and present an ensemble method SVEL(Deleterious Synonymous Variants Prediction)for predicting the harmfulness of sSNVs.The SVEL method incorporates six recently developed individual prediction tools(SilVA,TraP,PhD-SNPg,FATHMM-MKL,FATHMM-XF and DANN)as feature values.We also add 13 splicing and conservation features.SVEL was trained on training set from DDIG-SN method that excluded all variants that had previously been used to train above six individual predictors.SVEL outperforms existing approaches and IDSV.To take this method easily accessible for research and clinical use,we will provide pre-computed SVEL scores for all possible human synonymous variants.The experimental results show that IDSV and SVEL can both provide favorable or at least comparable performance compared with other methods,these methods can help us quickly find these potentially deleterious sSNVs.The construction of online or localized tools is also beneficial for researchers to use extensively,help us in the prevention and treatment of sSNVs related diseases.
Keywords/Search Tags:Synonymous variants, Pathogenicity prediction, Feature selection, Random forest, Ensemble learning
PDF Full Text Request
Related items