| Protein is a biopolymer composed of amino acids as the basic unit,which plays an important role in the development of the human body and the repair and renewal of damaged cells.However,the precursor protein is not active and needs to undergo different types of post-translation modifications(Post-translation modifications,PTMs)to become a functional mature protein.PTMs are a key mechanism for increasing the functional diversity of proteins.Traditional biological experiment methods to identify PTM sites are usually time-consuming and laborious,and there is an urgent need to develop calculation-based prediction methods.Among many PTMs,β-hydroxybutyrylation modification is one of the most important modifications recently discovered,and no prediction methods have been reported so far.In this paper,for the first time,machine learning methods and deep learning methods are applied to the prediction of β-hydroxybutyrylation sites,and a high-precision prediction model is constructed,the content is as follows:1.A LightGBM classifier based on XGBoost feature selection was proposed to predict protein β-hydroxybutyrylation sites named KbhbPred.First,three algorithms,EAAC,ZScale and BLOSUM62,were used to extract and fuse the sequence information,physicochemical attribute information and evolution information of proteins.Then,XGBoost algorithm is used for feature selection to remove redundant information in fusion features,and the optimal feature subset is obtained.Finally,β-hydroxybutyrylation sites were predicted using a LightGradient Boosting Machine(LightGBM)classifier.Through ten-fold cross-validation,the ACC value,MCC value and AUC value of KbhbPred on the training set are 0.8265,0.6531 and 0.8995 respectively;the ACC value,MCC value and AUC value on the independent test set are 0.7613,0.5290 and 0.8321 respectively.The experimental results show that the method proposed in this paper is superior to other prediction methods and can be better applied to the prediction of β-hydroxybutyrylation sites.2.A deep forest classifier based on multi-feature fusion was proposed to predict proteinβ-hydroxybutyrylation sites named KbhbPred2.0.Six algorithms of BE,DPC,Pse AAC,AAindex,GTPC and EGAAC are used to extract features,and convolutional neural network is used to extract complex features of sequences.After fusing the above features,XGBoost algorithm is used to screen out the optimal feature subset.The optimal subset of features was used to train a deep forest classifier to classify β-hydroxybutyrylation sites.The ACC value,MCC value and AUC value of KbhbPred2.0 under ten-fold cross-validation reached 0.8497,0.6728 and 0.9193 respectively;the ACC value,MCC value and AUC value on the independent test set were 0.7940,0.5883 and 0.8558 respectively.Compared with existing prediction methods,this method further improves the prediction effect ofβ-hydroxybutyrylation sites. |