Font Size: a A A

Method Development For Predicting Protein Subcellular Localization Based On Deep Learning

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2480306500486674Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the explosive development of bioinformatics,proteomics has gradually entered the post-genome era.As a hot topic and important content in proteomics,the study of protein subcellular localization prediction has great significance for the pathogenesis and drug design of some human diseases.In this paper,the problem of protein subcellular localization prediction accuracy is difficult to be effectively improved.The protein subcellular localization prediction method based on deep learning algorithm fusion is studied.The synthesis of protein sequence based on localization site is further discussed.The main research contents are as follows:Firstly,to solve the problem that traditional shallow machine learning algorithm cannot deeply mine the intrinsic characteristic information of protein sequence data,this paper proposed a method of protein subcellular localization prediction based on deep integration support vector machine(SVM)model.Firstly,this method uses the depth confidence network to characterize the depth characteristics of the protein sequence data,then the integrated learning strategy based on Bagging algorithm is introduced,and the integrated SVM model based on depth feature extraction is constructed to improve the generalization ability by using the differences of different models.The final prediction on the Deep Loc protein dataset indicates that this method has better prediction ability than traditional SVM.Secondly,in view of the highly nonlinear,unequal length of protein sequences,a novel method for protein subcellular localization prediction based on distributed coding and convolutional cycle self-attention mechanism is proposed.Firstly,the word vectors were trained by unsupervised learning with sequence data from protein database,and then the nonlinear features were extracted by convolutional neural network and long short-term memory network(LSTM).In order to comprehensively consider the long sequence feature information,the self-attention mechanism is added to learn the global feature.In addition,considering the problems of resources and training efficiency,the time convolutional neural network is designed to replace LSTM,which solves the problem of insufficient utilization of resources.The final prediction on the Deep Loc protein dataset indicates that both strategies improved the overall accuracy.Then,aiming at the problem of large amount of unlabeled data in current protein database,a prediction method of protein subcellular localization based on conditional antagonistic network domain adaptation was studied.Firstly,the source domain data and the target domain data are defined,and then the common feature representation between the data in different domains is learned according to the network model.Finally,the unmarked target domain sequence data is predicted according to the training results of the source domain data.Further,in order to avoid learning from scratch for sequences with different data distributions,a prediction method of protein subcellular localization based on language model was proposed.Finally,the prediction results on the SWISS-PROT data show the effectiveness and advancement of the two strategies.Finally,an algorithm of protein sequence generation based on feedback generation antagonistic network model is proposed to solve the problem that protein sequence data cannot be generated effectively at specific subcellular localization sites.First,the idea of reinforcement learning(Policy Gradient)was used to solve the problem of difficult reverse propagation in protein sequence generation.Then,in order to ensure the high quality of generated sequences,a method of real-time feedback generation was proposed.Finally,the sequence editing distance method was used to demonstrate the validity of the generated protein subcellular sequences.
Keywords/Search Tags:Protein subcellular localization prediction, Support vector machine, Convolution neural network, Long short-term memory, Domain adaptation, Generative Adversarial networks, Reinforcement learning
PDF Full Text Request
Related items