Font Size: a A A

Research On Prediction Of Protein Post-translational Modification Sites Based On Multi-information Fusion

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X W CuiFull Text:PDF
GTID:2370330611488139Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Post-translational modification(PTM)is a chemical modification of protein that occurs after translation and plays a vital regulatory role in life.In-depth research and identification of post-translational modification sites of proteins is of great significance in revealing the mechanism of life activity,screening clinical markers of diseases and identifying drug targets.This paper studies the post-translational modification sites of proteins based on machine learning and multi-information fusion.The research contents are as follows:1.Propose a new protein S-sulfenylation sites prediction method SulSite-GTB.First,feature extraction based on a variety of protein feature information,including amino acid composition(AAC),dipeptide composition(DC),encoding based on grouped weight(EBGW),K nearest neighbors scores(KNN),position-specific amino acid propensity(PSAAP),position weighted amino acid composition(PWAAC)and pseudo-position specific score matrix(PsePSSM).Fusion of seven kinds of feature coding information to obtain the feature search space.Second,the synthetic minority oversampling technique(SMOTE)algorithm is used to process the class imbalanced data,and the least absolute shrinkage and selection operator(LASSO)algorithm is used to remove redundant information to obtain the optimal feature subset.Finally,the optimal feature subset is input to the gradient boosting decision tree classifier to predict the S-sulfenylation sites,and the prediction performance of the model is evaluated using the method of 5-fold cross-validation and independent test dataset.The overall prediction accuracy is 92.86% and 88.53%,and the AUC values are 0.9706 and 0.9425,respectively,and compared with other prediction methods.The results show that SulSite-GTB is significantly better than other prediction methods.2.Propose a new protein malonylation sites prediction method DeepMal.First,enhanced amino acid composition(EAAC),grouped enhanced amino acid composition(EGAAC),dipeptide deviation from expected mean(DDE),K nearest neighbors scores(KNN)and BLOSUM62 matrix are applied to feature extraction.Second,a linear convolutional neural network is used to extract the malonylation sites-specific features,and then the relevant features are selected and the feature dimension is reduced through maximum pooling.Finally,a multi-layer neural network was used to classify malonylation and non-malonylation sites.On independent datasets E.coli,H.sapiens and M.musculus,the AUC values are 0.974,0.956 and 0.944,and the accuracy are 96.5%,95.5% and 94.5%,respectively.Compared with other prediction models,the prediction accuracy is increased by 9.5%-18.5%,further indicating the effectiveness of the prediction model DeepMal.The use of deep learning can improve the robustness of the DeepMal model for predicting malonylation sites and promote the prediction of post-translational modification sites for other proteins.
Keywords/Search Tags:machine learning, protein post-translational modification sites, multi-information fusion, gradient boosting, deep learning
PDF Full Text Request
Related items