Font Size: a A A

Protein Signal Sites Prediction Based On Deep Learning

Posted on:2023-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:J H YeFull Text:PDF
GTID:2530307142474344Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Post-translational Modification(PTM)is the main way to regulate protein structure and function.PTM occurs in almost all proteins.Currently,there are more than 400 known PTM species,which play a significant role in regulating various signaling pathways or networks,gene expression,enzyme activity,and protein interactions in cells.Accurate identification of PTM sites is of great significance for the study of protein function and structure.Traditional experimental verification methods have been gradually replaced by the methods based on machine learning because of their time and energy consumption.However,due to the complexity of PTM mode and the imbalance of positive and negative samples,the accuracy of the current PTM sites recognition algorithm based on traditional machine learning needs to be further improved.This study aims to explore the application performance of deep learning in PTM sites prediction.A deep learning prediction Model for PTM sites,MME(Mixture Model Ensemble),was proposed and applied to predict protein O-glycosylation and ubiquitination sites.Firstly,each PTM site was characterized by amino acid fragment centered at the target site,and two sequence features were extracted based on the Composition of k-spaced Amino Acid Pairs(CKSAAP)and one-hot encoding.Then,we design three deep neural network models with different structures for these two different types of sequence features.Based on the feature of CKSAAP,a three-layer fully connected neural network model is constructed.The convolutional neural network model and the bidirectional recurrent neural network model with Long Short Time Memory(LSTM)unit are constructed respectively for one-hot encoding.Finally,the output probability of the three network models is integrated by the summation method,and the normalized integration probability is used to complete the test sample prediction.For the prediction of O-glycosylation sites,we constructed three new datasets based on the Steentoft dataset with three positive and negative sample ratios of 1:1,1:5,and 1:10,and randomly selected 20% samples as the test set.After MME training,the AUC,ACC,and MCC of 5-fold cross-validation in the training set were 0.9917,0.9601,and 0.9209,respectively,which were significantly better than the prediction accuracy of the current reported models.Independent tests of balanced data set AUC,ACC and MCC were 0.9890,0.9651,and 0.9308,respectively,and their AUC values were slightly higher than the prediction accuracy of the reference model implemented in this study.In addition,in the 1:5and 1:10 unbalanced data sets,the independent test AUC of MME only showed a tiny decrease,which was 0.9212 and 0.9266,respectively.On the contrary,the independent prediction accuracy of the reference model decreased significantly,with AUC below 0.83 in the 1:5 data set and below 0.79 in the 1:10 data set.The results show that the MME model can accurately predict protein O-glycosylation sites,especially in unbalanced data sets.For the prediction of ubiquitination sites,based on the balanced data set constructed by Chen et al.,an improved one-HOT coding method was proposed to solve the problem of missing sequences.The AUC value,ACC value,and MCC value of MME’s 5-fold cross-validation accuracy in the training set were 0.8189,0.7462,and 0.4930.The AUC value of the independent test was 0.7744,the ACC value was 0.7104,and the MCC value was 0.4222.Compared with reference methods,the prediction accuracy of MME is improved obviously.Verified by two different PTM sites data,the deep learning model MME based on multinetwork structure integration can accurately predict PTM sites and solve the problem of sample imbalance in PTM sites prediction to a certain extent.
Keywords/Search Tags:Post-translational Modification, O-glycosylation, Ubiquitin, Fully connected neural network, Convolutional neural network, Bidirectional cyclic neural network
PDF Full Text Request
Related items