Font Size: a A A

Two Special Types Of Protein Functional Residues Of The Prediction And Biological Sequence Alignment

Posted on:2010-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:1110360302957473Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Bioinformatics concentrates on solving the problems constraining the development of biology. The progress made so far has been encouraging, and a number of results of great interest to those in biology, informatics and other subjects have been made, in addition to facilitating the rapid development of bioinformatics itself. At present, the basic framework of bioinformatics has been established, and the main problems to be solved are better-defined. However, considering that the internal logic of bioinformatics must be rigorously maintained as the subject evolves, and keeping in mind the goal of actual solutions to complex problems, it is clear that much still remains to be done. New methods of solving synchronous practical problems are desired, and the existing methods must be further improved. The focus is on problems related to biological sequence alignment, and protein functional residue prediction.This thesis contains three main results:First, the SPA algorithm is extended to be applicable to a general penalty/scoring matrix and the mathematical proof is given. This extended SPA algorithm has a wider range of uses. By adjusting the penalty/scoring matrix in terms of practical demands, the algorithm can provide an appropriate alignment. It also reduces the number of cases in which the optimal alignment may not be exclusive when the Hamming matrix is used, and therefore may be used in the design of SPA-based multiple sequence alignment algorithms.Second, sequence-based methods for the large-scale fast prediction of protein functional residues are designed. This work enhances the fundamental understanding of how proteins perform their functions, and can be used to screen possible functional residues for experimental determination. Since experiments can be costly and time consuming, screening will save time, labor and material resources. Feature selection is performed, not only reducing the dimensionality of the input and decreasing the computational time, but also revealing the biological meaning inside features.(1) A sequence-based catalytic residue predictor called "CRpred" is proposed, with predictions of quality comparable to modern structure-based methods, and exceeding the quality of state-of-the-art sequence-based methods. This analysis, performed on selected features, indicates the following four characteristics: a) Amino acids are characterized by varied propensities to become catalytic residues, from high (His, Cys, Asp, Arg, Glu and Tyr) to low (Val, Ala, Ile, Pro, Leu and Met), with glycine (Gly) providing flexibility for catalytic sites; b) The most important factor contributing towards accurate predictions is residue conservation. Catalytic residues, irrespective of type, tend to be more conserved compared to the general population of residues. Highly conserved amino acids, characterized by high catalytic propensity, are likely to form catalytic sites; c) Certain sequence motifs such as CysXXCys, AspXLysXXAsn, which are associated with catalytic reactions, are found to contribute to the prediction; and d) Although catalytic residues prefer a relatively more hydrophobic neighborhood, they are likely to be surrounded locally (with respect to the sequence) by hydrophilic residues.(2) RNA-binding residues are identified according to a distance-based cutoff definition, and a new predictor "RBRpred" is designed which predicts RNA-binding residues from protein sequence, improving quality with respect to the current sequence-based methods. The four findings through feature selection are as follows: a) The positively charged amino acids Arg and Lys show higher propensity to form RNA-binding sites, due to their ability to participate in interactions with the negatively charged phosphate backbone of RNA; the small size of Gly provides flexibility for protein-RNA interactions; and Asp (with its negatively charged side chain) together with several hydrophobic residues (such as Leu, Val, Ala and Phe) are not preferred in RNA-binding sites; b) Sequence conservation plays a fundamental role in predicting RNA-binding residues; c) Coil residues, especially those in long coil segments, are more flexible and can easily interact with RNA; helices, however, are more rigid, and consequently residues in helices have less chance to bind with RNA; and d) Residues with higher relative solvent accessibility are more likely to be in RNA-binding sites.Third, generalized error correcting code is applied to DNA computing, with focus on mutation errors in DNA computing, and design of a DNA operating system with error correction in order to solve the Hamiltonian circuit problem.
Keywords/Search Tags:fast alignment algorithm based on general score/penalty matrix, protein functional residues prediction, catalytic residues, RNA-binding residues, mutation error correction in DNA computing
PDF Full Text Request
Related items