Font Size: a A A

The Prediction Of Protein Ubiquitination With Multi-features

Posted on:2016-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:C X HeFull Text:PDF
GTID:2180330464959174Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Post-translational modification refers to the machining process of gene expression in cells, which puts a modified group on one or more amino acid residues, or changes the proteins character by hydrolyzed shear. PTM is important way of regulating protein function, also plays a crucial role on the structure and function of proteins. Currently, more than 400 different ways of PTM have been confirmed.Protein ubiquitination since around 1977 was found since around 1977, has been a hot research topic of PTMs. Protein ubiquitination modification plays an extremely important role in cell apoptosis, signal transduction, transcriptional regulation, DNA repair, cell swallowing function, disease and health status, survival and cell death, and a series of basic process. As we known, abnormal ubiquitination can lead to many major human diseases, such as cancer, the occurrence of Parkinson’s disease, alzheimer’s disease, and so on. The elucidation of protein ubiquitination and its mechanism has important significance on pathogenesis of cell genetic information interpretation and expression regulation of a variety of diseases.In this paper, we selected two datasets of protein ubiquitination sites, yeast and human. We utilize three feature encodings, including the feature of amino acid occurrence frequency, amino acid factors and the composition of k-spaced amino acid pairs to represent the ubiquitination site peptides. Furthermore, we combined three features, and the detail is(1) amino acid factors and the composition of k-spaced amino acid pairs;(2) amino acid occurrence frequency, amino acid factors and the composition of k-spaced amino acid pairs;(3) amino acid occurrence frequency and the composition of k-spaced amino acid pairs. The Support Vector Machine was as the predictor to get the performance of our models. And we used the Adaboost ensemble learning to deal with the problem of unbalance positive and negative samples. In the experimental part, we mainly divided into three process:(1)the performance of three combined features;(2)compared the results in different value of K;(3) built a model with ensemble learning method.As the experimental results shown, the predictor performance of amino acid occurrence frequency and the composition of k-spaced amino acid pairs were best in yeast and human ubiquitination site. And by the SVM predict, the accuracy is 72.82%, sensitivity is 64.03%, specificity is 82.17% and MCC is 0.4745 for the yeast; accuracy is 70%, sensitivity is 73.66%, Sp is 66.34% and MCC is 0.4012 for the human. the value of K is 4 for two datasets, it indicated the model contained more important information, less redundancy information by the parameter. We combined under sampling and Adaboost ensemble learning to deal with unbalance problem,and the result was best for the yeast ubiuitination sites; Theaccuracy is 78.34%, sensitivity is 77.36%, specificity is 78.85% and MCC is 0.5422. To compare with other methods, our predictors were effective for prediction of proteins ubiquitination sites.
Keywords/Search Tags:PTM, Ubiuitination, site, SVM, the composition of k-spaced amino acid pairs, Adaboost, ensemble learning
PDF Full Text Request
Related items