Font Size: a A A

Several New Mathematical Models In Predicting Protein Functional Sites

Posted on:2012-03-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C DouFull Text:PDF
GTID:1220330368485832Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
In recent years, Bioinformatics is becoming more and more important in many fields of Biology and many classical biological research methods have been modified by Bioinfor-matics. It helps biological research mainly in two aspects:(1) Using computational and mathematical methods to deal with large amounts of biological data and find biological laws underlying these data. (2) Developing mathematical models for difficult biological problem and guiding experimental analysis. As an important open problem in Bioinfor-matics, the prediction of protein functional sites is a representation of using mathematical models to guide experimental analysis. By the method, sites of a protein chain are scored by their importance and results are helpful for the "site mutation" experiment. In the pa-per, we focus on the prediction of catalytic sites, and the main results can be summarized as follows:1. In chapter 2, we proposed two new amino acid background distributions which perform better than the observed one in measuring protein sequence conservation; In order to incorporate amino acid similarity into conservation measures, we used Taylor’s 10 overlapping classification of amino acids rather than disjoint classification used by previous methods; Moreover, these properties were also incorporated into relative entropy model to account for their background distribution.2. In chapter 3, a new sequence-based catalytic site prediction method, Llpred, was proposed which uses several new sequence-based features and the Ll-logreg classifier. Results suggest that our new features are more effective than previous ones and the L1-logreg classifier is faster than the commonly used support vector machine classifier. So our new method is competent for tasks involving genome wide analyses where speed is an issue; we tested the importance of sequence conservation features in machine learning-based methods and found that conservation features contain the main information of them. Besides, incorporating conservation information of sequentially or structurally adjacent sites is not always helpful for conservation features.3. In chapter 4, We presented a new strategy, the chain specific strategy, to use the ROC analysis in evaluating functional site prediction methods. Our controlled experiment suggests that the new strategy overcomes two weaknesses of previous strategy successfully. So the new one is more accurate in evaluating functional site prediction methods and should be used in future works.
Keywords/Search Tags:Bioinformatics, Protein functional site, Protein sequence conservation, Machine learning method, ROC analysis
PDF Full Text Request
Related items