Classification Models For Predicting Protein β-hydroxybutyrylation Sites

Posted on:2024-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:C B Fan

Full Text:PDF

GTID:2530307064455784

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Protein is a biopolymer composed of amino acids as the basic unit,which plays an important role in the development of the human body and the repair and renewal of damaged cells.However,the precursor protein is not active and needs to undergo different types of post-translation modifications(Post-translation modifications,PTMs)to become a functional mature protein.PTMs are a key mechanism for increasing the functional diversity of proteins.Traditional biological experiment methods to identify PTM sites are usually time-consuming and laborious,and there is an urgent need to develop calculation-based prediction methods.Among many PTMs,β-hydroxybutyrylation modification is one of the most important modifications recently discovered,and no prediction methods have been reported so far.In this paper,for the first time,machine learning methods and deep learning methods are applied to the prediction of β-hydroxybutyrylation sites,and a high-precision prediction model is constructed,the content is as follows:1.A LightGBM classifier based on XGBoost feature selection was proposed to predict protein β-hydroxybutyrylation sites named KbhbPred.First,three algorithms,EAAC,ZScale and BLOSUM62,were used to extract and fuse the sequence information,physicochemical attribute information and evolution information of proteins.Then,XGBoost algorithm is used for feature selection to remove redundant information in fusion features,and the optimal feature subset is obtained.Finally,β-hydroxybutyrylation sites were predicted using a LightGradient Boosting Machine(LightGBM)classifier.Through ten-fold cross-validation,the ACC value,MCC value and AUC value of KbhbPred on the training set are 0.8265,0.6531 and 0.8995 respectively;the ACC value,MCC value and AUC value on the independent test set are 0.7613,0.5290 and 0.8321 respectively.The experimental results show that the method proposed in this paper is superior to other prediction methods and can be better applied to the prediction of β-hydroxybutyrylation sites.2.A deep forest classifier based on multi-feature fusion was proposed to predict proteinβ-hydroxybutyrylation sites named KbhbPred2.0.Six algorithms of BE,DPC,Pse AAC,AAindex,GTPC and EGAAC are used to extract features,and convolutional neural network is used to extract complex features of sequences.After fusing the above features,XGBoost algorithm is used to screen out the optimal feature subset.The optimal subset of features was used to train a deep forest classifier to classify β-hydroxybutyrylation sites.The ACC value,MCC value and AUC value of KbhbPred2.0 under ten-fold cross-validation reached 0.8497,0.6728 and 0.9193 respectively;the ACC value,MCC value and AUC value on the independent test set were 0.7940,0.5883 and 0.8558 respectively.Compared with existing prediction methods,this method further improves the prediction effect ofβ-hydroxybutyrylation sites.

Keywords/Search Tags:

Protein β-hydroxybutyrylation, Machine learning, LightGBM, Deep forest, Feature selection

PDF Full Text Request

Related items

1	A Research On The Approaches Of Feature Selection Via Deep Learning And Its Application
2	A Study On Feature Extraction And Classification Algorithms For Protein Structural Class Prediction
3	Predicting Non-coding RNA-protein Interactions By Machine Learning
4	Predicting Carbonylation Sites Based On Machine Learning Methods
5	Research On Swarm Intelligence Feature Selection Algorithm For Protein Sequence Classification
6	Predicting Protein Protein Interactions And Its Active Sites Based On Data Mining Algorithm
7	Classification Of Non-classical Secreted Proteins Of Gram-positive Bacteria Based On Two-layer LightGBM-based Ensemble Model
8	Research On The Protein Modification Sites Based On Machine Learning
9	Application Of Machine Learning In Space Environment Feature Recognition And Analysis
10	A Machine Learning Model For Runoff Prediction Based On Feature Selection And Joint Time-Frequency Analysis