Font Size: a A A

Prediction Of Several Proteins Post-translational Modification Sites Based On Up-down Sampling

Posted on:2019-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZuoFull Text:PDF
GTID:2310330542471985Subject:Mathematics
Abstract/Summary:PDF Full Text Request
This paper mainly focuses on the unbalanced problem of datasets appearing in post-translational modification of proteins.The main results are as follows:(1)According to the problem of low prediction accuracy for the protein S-sulfenylation site,a prediction model S-SulfPred based on OSSU-SMOTEO resampling was proposed in this paper.Firstly,PSAAP and 67 AAPPI were used for feature extraction.Then,OSSU-SMOTEO resampling method was used to balance the training data set.Finally,the prediction model S-SulfPred was established based on the 10-fold cross validation.The experimental results show that S-SulfPred is effective for the recognition of S-sulfinylation sites in proteins.(2)For the first time,the one-sided selection undersampling method was used to predict the carbonylation sites in human proteins.At the same time,four coding schemes:PSAAP、CKSAAP、AAC and CHHAA,were used for feature extraction.By comparing with prediction models:PTMPred、CarSpred、predCar-site and CarSPred.Y on the same dataset,the prediction model CarSite established in this paper is obviously superior to the other four prediction models.(3)O-glycosylation,a major post-translational form of protein,plays an important role in complex life activities.O-glycosylation sites identified by experimental methods are time-consuming and costly.In this paper,an integrated model O-GlcNAcPRED-Ⅱfor the prediction of O-glycosylation sites was constructed.By searching the literature and the latest database,the latest benchmark data set was established.For the extreme imbalance of data,a sampling method(KPCA-FUS)combining K-means principal components analysis and fuzzy undersampling is proposed in this paper.The rotating forest integrated learning algorithm is used to construct the predictive model O-GlcNAcPRED-Ⅱ.The effectiveness of O-GlcNAcPRED-Ⅱ predictions was verified using 10-fold cross-validation and independent test sets,respectively.
Keywords/Search Tags:resampling, S-sulfinylation, carbonylation, O-glycosylation
PDF Full Text Request
Related items