Font Size: a A A

Research On The Prediction Of Protein Posttranslational Modification Sites Based On Deep Learning

Posted on:2022-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:L L SongFull Text:PDF
GTID:2480306770991029Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Post-translational modification(PTM)of protein refers to a chemical modification in the process of protein formation after mRNA translation.PTM is a key mechanism for increasing biodiversity,affecting nearly every aspect of normal cell biology and pathogenesis,including processes such as cell differentiation,protein degradation,signaling,and regulation.The related issues of PTM have become an important topic in the current proteomics research,and the comprehensive and accurate identification of PTM sites is the hot and difficult point of current research.In order to more effectively predict protein post-translational modification sites and effectively improve the prediction accuracy,this paper is based on deep learning,and the main tasks are as follows:1.A new prediction model for malonylation,Malsite-Deep,is proposed.First,the seven feature extraction methods are used to extract feature information of protein sequences.Then,the under-sampling Near Miss-2 method is applied to handle imbalance data,and the update gate and reset gate of gated recurrent units(GRU)are used to select the optimal feature subset.Finally,the data from GRU layer is input into deep neural networks(DNN)to predict the malonylation sites,and the model performance is evaluated by 10-fold cross-validation and independent test sets.The 10-fold cross-validation shows that the AUC value on the training dataset reaches 0.99.The AUC values on the four independent test datasets all reach above 0.95.Results suggest that Malsite-Deeppresented here facilitates the identification of protein malonylation sites.2.A new prediction model of carbonylation,Precar?Deep,is proposed.First,six feature extraction methods are used to obtain the original feature space from the protein sequences.Then,the Group LASSO method is used to remove redundant information and the oversampling Borderline-SMOTE method is employed to balance the data to obtain a new feature space.Finally,the processed data is input into the deep learning framework constructed in this paper to predict the carbonylation sites,and the performance of the model is evaluated by using 10-fold cross-validation and independent test datasets.The AUC values of the four datasets are all more than 90%.The experimental results show that PreCar?Deepis superior to other existing models and is helpful to identify protein carbonylation sites.3.A new multi-type acylation site prediction model,PMPA?DeepTL,is proposed.Firstly,nine feature extraction methods including AAC,ANBPB,DDE,EBGW,CT,MMI,Hydropathy index,AD and BLOSUM62 are used to transform protein sequences into digital information,and then feature fusion is carried out for these digital information.Secondly,for the serious imbalance of the positive and negative samples of the data in this paper,the method Smote Tomek,which combines oversampling and undersampling,is used to process the data into balanced data.Finally,convolutional neural network is used to classify succinylation sites.Through the model pre-trained on the succinylation site dataset,the parameters of the fully connected layer of the convolutional neural network are finetuned,and the model is transferred to different acylation site datasets for classification.Compared with other prediction models,the AUC values of succinylation sites in the independent test set are all above 0.9.Meanwhile,by fine-tuning the network of the pretrained model,other different types of acylation datasets also achieve good accuracy,indicating that PMPA?DeepTL is helpful to identify a variety of protein acylation sites.
Keywords/Search Tags:protein post-translational modification, deep learning, transfer learning, multi-information fusion, imbalanced algorithm
PDF Full Text Request
Related items