Font Size: a A A

Research On Prediction Of Protein Post-translational Modification Sites Under Imbalance Classification Mode

Posted on:2019-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhangFull Text:PDF
GTID:2370330563991961Subject:Statistics
Abstract/Summary:PDF Full Text Request
Classification of imbalance datasets is a research hotspot in machine learning.The traditional classification algorithm is generally suitable for balancing data,so we need to excavate more effective methods.Protein post-translational modification is an important process in which protein plays a normal function in the life body.After biosynthesis of protein,protein needs proper post-translational modification to show normal biological activity.Protein function site is a kind of residue that promotes protein molecules to exercise their functions.The identification of functional sites can help to understand the various functional significance of biology.In this paper,the protein post-translational modification site is predicted in the unbalanced classification model,and the prediction model of pSumo-CD and iCar-PseCp is constructed,the characteristics or structural features of the protein itself sequence are extracted.The machine learning algorithm is used to make prediction and make up for the shortage of traditional methods.Sumoylation is a type of protein translation modification(PTM),which plays an important role in subcellular transport,transcription,DNA repair and signal transduction.Studies have shown that Sumoylation can promote the comprehensive performance of proteins.In this study,the conditional probability of sequence coupling is used to extract the features of data sets,and the covariance discriminant method is used to optimize the unbalanced dataset,and also an online predictor of Sumoylation site prediction named pSumo-CD is developed.The results of Jackknife test show that MCC,Acc,Sn and Sp are 0.846,97.88%,82.01% and 99.21%,respectively.Compared with other predictors,the advantages of this paper are highlighted.Carbonylation is a kind of post-translational modification(PTM),and the identification of carbonylation modification site is a hot topic in biology.In this paper,a new predictor,iCar-PseCp,is developed.The feature information is extracted by combining the sequence coupling information with the general pseudo-amino acid components,and the Monte Carlo sampling is used to expand the positive data set to balance the tilted training dataset.Then,the random forest algorithm is used to classify it.The 10 fold cross validation results show that the new predictor is obviously better than the existing predictor.In this paper,we also solve the problem of unbalanced dataset.The problem of unbalanced dataset classification is very important for experimental research.On the basis of the existing research results,this paper uses a new balanced data set method,such as covariance discriminant algorithm and monte carlo sampling,to predict the proteinpost-translational modification sites,so as to achieve higher accuracy.
Keywords/Search Tags:post-translational modification, Nonequilibrium Model, amino acid, Feature extraction, bionformation
PDF Full Text Request
Related items