Font Size: a A A

Analysis And Prediction Of Interactions Between Residues In Proteins

Posted on:2008-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:1100360242464745Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Recent years, more and more biological data are needing to be corrected, managed, explained, and sufficiently utilized because of the speedy development of the bioinformatics. However, machine learning methods are just suitable for handling these data with huge size and noise. So far, many machine learning algorithms have been successfully used to deal with those huge biological data, and to mine and discover unknown biological knowledge. This thesis mainly uses machine learning tool such as support vector machine (SVM) to analyze protein structure, and adopts SVM and genetic algorithm (GA) to predict residue' s temperature factor (B-factor) as well as predict long-range contact between residues. The main works for this thesis are introduced as follows:1. A multi-class support vector machine (SVM) based prediction method was proposed in this thesis to analyze and predict B-factors of residues of protein. In general, the temperature factor or B-factor of residue, which is linearly related to the mean square displacement of its C_αatom,indicates the atomic flexibility in the crystalline state. Previous works have shown that hydrophobic residues, which are usually buried, tend to be more rigid whereas charged residues tend to be more flexible. Consequently, the prediction of the B-factor may help to understand and predict the three-dimensional structure of protein. In conclusion, this thesis mainly makes use of some selected properties of amino acid residue, such as sequence profile of protein chain, evolutionary rate of residue, and hydrophobic value of residue, as the input for multi-class support vector machine to analyze and predict the B-factor of residue.2. A prediction approach was proposed to predict the inter-residues contact cluster centers based on predicted residue B-factor, hydrophobic value of residure and support vector machine. It is general knowledge that inter-residues contacts are always gathered together to form the clusters in contact maps of proteins. Observation can be seen that almost all inter-residues contact clusters correspond to pairs of residues with local lowest-B-factor or within higher hydrophobic areas. Moreover, selectively extracting input vector for predictor based on these characteristics can reduce the imbalance of positive-negative sample data. Thus, higher prediction performance can be obtained. After that, SVM was used to predict inter-residues contact cluster centers. As a result, inter-residues interacting sites can be obtained.3. A genetic algorithm based on sequence profile (SP) centers of residue pairs was constructed to predict the sequence profile centers of the inter-residues as well as long-range interacting sites of the inter-residues. Firstly, we constructed a genetic algorithm-based multiple classifier (GaMC), and discovered that most long-range contacts are clustered around their SP centers. Secondly, using the GaMC predictor may separate residue pairs in contacts from those in non-contacts. Finally, we can make a decision whether or not two residues are in long-range contact based on the GaMC predictor and SP centers.
Keywords/Search Tags:Bioinformatics, Support vector machine, Genetic algorithn, B-factor, Long-range interaction, Contact map prediction, Sequence profile, Sequence profile center, Contact cluster center, Residue pair
PDF Full Text Request
Related items