Font Size: a A A

Research On The Protein Modification Sites Based On Machine Learning

Posted on:2022-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiuFull Text:PDF
GTID:2480306548496974Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the completion of genetic engineering and the coming of post genome era,the life research increasingly involves new fields such as functional proteomics.Among them,PTMs participate in most of the physiological activities such as signal transduction,proliferation,development and differentiation in cells.It widely exists in various biological processes,which is of great significance.With the development of biochips and high-throughput sequencing technology,a variety of data such as genome,transcriptional group,proteome and metabolomics have emerged.Traditional experimental methods can not meet the needs of modern research.Machine learning method,by means of computer and various tools of mathematics and biology,can compensate for the shortcomings of expensive and time-consuming in traditional experimental methods,and has become increasingly prominent in this research field.Prediction models of post translation modification sites are proposed by using machine learning method in this paper,which are as follows:1.The prediction model LightGBM-Cro Site of protein crotonylation sites is proposed.Firstly,five methods of BE,PWAA,EBGW,KNN and Pse PSSM are used to extract protein features from amino acid sequence information,physicochemical information and evolutionary information.Then elastic net is used to select the optimal feature subspace.In order to avoid the adverse impact of sample imbalance on the prediction results,SMOTE algorithm is used to deal with the data.Finally,LightGBM is used to classify the crotonylation sites and non crotonylation sites.Through the performance evaluation of the model,the ACC,MCC and AUC are 98.99%,0.9798 and 0.9996 respectively.The results show that the proposed method is superior to other prediction methods and can be better applied to the prediction of crotonylation sites.2.The prediction model Stack-Mal Site of protein malonylation sites is proposed.Firstly,AAC,BLOSUM62,BPB,EBGW,KNN,MMI,Pse AAC,PSPM and PWAA are used to extract protein features of amino acid sequence information,physicochemical information and evolutionary information.Then Group Lasso is used to select the optimal feature subspace.Finally,DNN,CNN,RF and LightGBM are used as the base classifier of stacking ensemble classifier,and SVM as the meta classifier to classify malonylation sites.The ACC,MCC and AUC on training dataset are 98.96%?0.7117 and 0.9921 respectively.The ACC,MCC on independent dataset are 95.15%?0.7029,respectively.Compared with other prediction methods,the results show that the proposed method is superior to other prediction methods and can be better applied to the prediction of malonylation sites.
Keywords/Search Tags:machine learning, protein crotonylation sites, LightGBM, protein malonylation sites, stacking ensemble learning
PDF Full Text Request
Related items