Font Size: a A A

Accurate In Silico Identification Of Species-specific Protein S-glutathionylation Sites With Multiple Features And Analysis

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:M L AiFull Text:PDF
GTID:2310330485456932Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
S-Glutathionylation is a reversible protein post-translational modification,playing an important role in regulating protein stability and redox regulation.To fully understand S-glutathionylation mechanisms,identification of substrates and specific S-glutathionylated sites is crucial.Compared with the use of biological experiment method to identify S-glutathionylated sites,the calculation method is more efficient.Therefore,looking for a good prediction method of identifying S-glutathionylated sites is particularly important.At present,Although one predictor has been developed for the prediction of S-glutathionylated sites,some problems still need to be taken into consideration.First,the training dataset of Sun's predictor was too small and there was no independent test dataset to verify their predictor.With more and more S-glutathionylated sites being experimentally verified,it is necessary to modify the prediction model.Second,the biological hallmarks around the S-glutathionylated sites have not been systematically investigated.So based on the above issues,we presented a new computational tool known as PGluS,which was developed to predict the S-glutathionylated sites by using the latest data of the dbGSH database.14 In order to extract the most informative amino acid residue features and show how much important the roles of these features played in the prediction,multiple feature descriptors were utilized,such as the composition of k-spaced residue pairs(CKSAAP)and the encoding based on grouped weight(EBGW).The results showed that the PGluS achieved71.41% accuracy,75.53% sensitivity,67.32% specificity and a MCC of 0.431,which demonstrated that PGluS was very promising to predict S-glutathionylated sites.But some disadvantages still exist.First,the overall performance of the mentioned predictors is still not fully satisfactory,and there is still room to improve the predictive performance.Second,the existing predictors disregarded the differences between species by considering all species-specific S-glutathionylated sites as general sites to develop a general model.To solve the drawbacks of the existing methods,we developed a new computational tool based on support vector machine,termed SSGlu,which was specifically designed to identify species-specific S-glutathionylated sites based on multiple protein sequence-derived features,including Binary encoding amino acid sequence profiles(BE),Amino acids composition(AAC),Physicochemical properties of amino acids(PCP),Autocorrelation functions(ACF),and Position specific scoring matrix(PSSM).A two-step feature selection was used to select the optimal feature subset.By 5-fold cross validation,the performance of SSGlu was measured with an AUC of 0.8075 and 0.8078 for H.sapiens and M.Musculus.respectively.Additionally,SSGlu was compared with the existing methods,and the higher MCCand AUC values of SSGlu demonstrated that SSGlu was very promising to identify S-glutathionylated sites.We also provide SSGlu online server,that user convenients to predict online.
Keywords/Search Tags:S-glutathionylated Sites Prediction, Multiple Feature, Machine learning, Database
PDF Full Text Request
Related items