| Protein post-translational modifications(PTMs)are important mechanisms for regulating protein function,enabling more diverse protein types,more complex structures and better functions,and playing an irreplaceable role in biological processes and signaling pathways.Recently,lysine succinylation modifications have been identified as a new type of PTMs that can cause changes in protein properties and have significant effects on protein structure and function.The accurate identification of succinylation sites is important for the study of protein cellular functions and the pathology of related diseases.Due to the high cost,low efficiency and complicated experimental process of traditional biological experimental methods,it is difficult to obtain a large amount of site modification information in a short time,and the existing computational methods are not effective in predicting succinylation sites,and it is necessary to develop computational methods with better prediction performance.In this paper,to address the above issues,machine learning algorithms have been used to investigate the succinylation site prediction,and the main research includes the following points:1.Establish a prediction model p Suc-FFSEA based on feature fusion and Stacking integrated learning,extract sequence features and physicochemical properties by EBGW,One-Hot,CBOW,CGR and AAF_DWT,apply LASSO to select the optimal feature subset,and apply machine learning,SVM,Light GBM and logistic regression,etc.Learning algorithms such as width learning,SVM,Light GBM and logistic regression are applied to construct Stacking integrated classifier to predict the data set of amber acylated loci collected from published literature with a prediction accuracy of 77.73% and AUC of0.8501.The comparison results with other advanced models show that the model p SucFFSEA built in this paper has stronger generalization performance.2.A prediction model p Suc-EDBAM based on dense convolutional blocks with attention mechanism module is established,and One-Hot is used to obtain the feature maps of protein sequences and generate low-level feature maps by one-dimensional CNN.In the feature learning process,dense convolutional blocks are used to obtain different levels of feature information.The channel attention mechanism module is also introduced to evaluate the importance of different features.Finally,a Softmax classifier is used to predict the succinylation site.The results show that the prediction accuracy reaches 74.25% with an AUC of 0.8201 under the independent test set.p Suc-EDBAM is found to have better prediction performance when compared with other advanced models.And a comparison test was conducted between the models p Suc-FFSEA and p Suc-EDBAM based on the same test set,and the results showed that the model p Suc-EDBAM was more advantageous.3.To provide convenience for researchers,this paper combined with Flask,a web application framework for Python,to develop an online prediction platform for succinylation sites based on the model p Suc-EDBAM,which can be downloaded from https://bioinfo.wugenqiang.top/p Suc-EDBAM/.The prediction platform provides both single protein sequence prediction and file batch prediction,and provides a user guide in the website,through which researchers can conveniently implement potential succinylation site prediction. |