Font Size: a A A

Prediction Of Deleterious Synonymous Mutation Based On Undersampling Scheme

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X TangFull Text:PDF
GTID:2480306542968019Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The genetic codon of the living organism usually exists degeneration phenomenon.Although the base composition of the original genetic codon is changed,the encoded amino acid remains the same,which is synonymous mutation.Previous studies believe that the synonymous mutation does not affect the phenotype of the organism and the development of diseases generally.However,with the growth of precision medicine towards globalization,many experiments and researches have indicated that synonymous mutation also plays an extremely crucial part in disease risk.With the rapid development of big biological data,the number of scholars exploring the function of synonymous mutation increased gradually.The development of computational methods for the pathogenicity(or functional impact)of gene mutations is efficient,providing a basis for further experimental verification and discovery.To satisfy this need,a variety of excellent computational models have been developed.For the prediction of disease-causing synonymous mutation,they can be divided into two categories: general method and specific method.The general method for all types of single nucleotide mutations,including synonymous,non-synonymous,and other types of mutations.But its accuracy is not always satisfactory in the identification of specific mutations(e.g.,synonymous mutation).The specific method is only for synonymous mutation,which also has significant problems.For example,the data for modeling is limited,and the highly imbalanced distribution of positive and negative samples.Besides,for deleterious synonymous mutation prediction,there is no related work discussing the performance of deep learning models currently.Considering these problems,we improve and analyze the deleterious synonymous mutation prediction method through the following two works:(1)A deleterious synonymous mutation prediction method based on the undersampling scheme and random forest is established.Firstly,the datasets from various data sources are collected and six undersampling strategies(Near Miss-1,Near Miss-2,Near Miss-3,Cluster Centroid,Close by and Random undersampling)are used to solve the problem of the imbalanced dataset.The results of performance indicators imply that Cluster Centroid is the most effective strategy.Secondly,a new predictor called us DSM(Undersampling Scheme based method for Deleterious Synonymous Mutation prediction)is introduced after considering two normalization techniques for the datasets and four traditional machine learning algorithms.In detail,the model us DSM uses 14-dimensional biology features(functional score and evolutionary conservation)and random forest to predict the deleterious effect of the human synonymous mutation.Meanwhile,us DSM achieved a substantially improved overall performance compared to the existing methods.Finally,a user-friendly online prediction server of us DSM(http://usdsm.xialab.info/)has been constructed to facilitate the use of researchers,promoting the development of deleterious synonymous mutation prediction.(2)Development and analysis of pathogenic synonymous mutation prediction method based on the undersampling data and deep learning.Based on the above work,undersampling data is used to discuss the performance of the combination deep learning model with us DSM.Specifically,DNA sequence feature is added for deep learning in the feature engineering section.Five different deep learning models(e.g.,convolutional neural network,long short-term memory network and other models)and two combination models with us DSM are considered.The results of the training and the test dataset represent that,the deep learning model does not play a significant role in this study,although it has superior effects in other fields.On the contrary,the number of model parameters is increased and the structure becomes complex.This may due to the current training data and model used in this study is limited.The prediction method presented in this paper,us DSM,has better robustness and higher accuracy than the existing prediction tools for pathogenic synonymous mutation prediction.In the meantime,it does not improve the performance with a more complicated model structure by analyzing the pathogenic synonymous mutation prediction method based on the undersampling data and deep learning.The above results of the research will contribute to the future development of deleterious synonymy mutation prediction in the human genome.They not only promote the understanding of disease occurrence and development but also provide a reference for the design of precision medicine programs.
Keywords/Search Tags:Deleterious synonymous mutation, Undersampling scheme, Machine learning, Random forest, Deep learning
PDF Full Text Request
Related items