Font Size: a A A

Evaluating local structure alphabets for protein structure prediction

Posted on:2004-06-07Degree:Ph.DType:Dissertation
University:University of California, Santa CruzCandidate:Karchin, RachelFull Text:PDF
GTID:1460390011476976Subject:Computer Science
Abstract/Summary:
Local structure alphabets are discrete encodings of one or more properties of local protein structure that cluster residues with similar properties into the same state. They allow us to represent a protein's structure, in simplified form, as a one-dimensional string. I explore whether there are preferred ways to encode local protein structure to best recognize relationships between distantly related proteins.; To identify the most informative alphabets of local protein structure, I have developed an evaluation protocol and applied it to 48 candidate alphabets. The evaluation includes many new alphabets, as well as some taken from the literature, covering descriptions of backbone geometry, residue burial, and side-chain orientations. The main criteria sought in a local structure alphabet are predictability, conservation within a collection of structurally-similar proteins, and improvement in fold recognition and alignment quality.; An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Knowledge-based methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Distant homologs, which may have significant structural similarity, are often not detectable by sequence similarities alone. These weak relationships may be recognized by combining sequence information and evolutionary information (about the target and template), structural information (about the template), and predicted information about the target's local structure. All these kinds of information can be incorporated into threading algorithms [39, 175], hidden Markov models (HMMs) [66, 109] or profiles [113].; When compared to the baseline fold-recognition and alignment performance of a HMM that uses only amino-acid information, HMM s enhanced with a secondary track of local-structure-alphabet emissions show a substantial improvement when judiciously selected alphabets are used. A simple three-state classification of secondary structure and a seven-state description of residue burial, based on a count of neighboring C beta atoms within a 14A-radius spherical cutoff, are most useful for fold recognition. A six-state secondary-structure alphabet and a fourteen-state secondary-structure alphabet that includes classifications of beta-strand orientation are most useful for improving alignments. The best fold recognition alphabet contributes a 40% improvement to HMM performance and the best alignment alphabet contributes a 62% improvement.
Keywords/Search Tags:Structure, Alphabet, Fold recognition, HMM, Improvement
Related items