Font Size: a A A

Algorithms for the identification of functional sites in proteins

Posted on:2010-01-05Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Capra, John AnthonyFull Text:PDF
GTID:1440390002476923Subject:Biology
Abstract/Summary:
Proteins play an essential role in nearly every process carried out by the cell. In accomplishing this incredibly diverse array of functions, proteins interact with one another and other molecules in their environments. The interactions of proteins with other molecules are mediated by specific amino acids. For a given protein, the identification of the residues that participate in its interactions can be a crucial step in understanding its function. Knowledge of these so-called functional sites can guide further experimental analysis of the protein and aid drug design and development. The large number of protein sequences and structural models that have become available over the past 10 years present an exceptional opportunity to use the methods of computer science and statistics to identify protein functional sites, and thereby further biological understanding.;This dissertation investigates the computational prediction of functional sites from protein sequence and structure data. First, we consider the estimation of evolutionary sequence conservation from a multiple sequence alignment of homologous proteins---a common first step in the identification of functionally important sites. We introduce a fast, information theoretic algorithm for scoring conservation and demonstrate that it provides state-of-the-art performance in predicting catalytic sites, ligand binding sites, and protein-protein interface residues. Second, we examine the identification of a class of functional residues that cannot be identified by considering sequence conservation alone: those that determine functional substrate specificity within homologous protein families. We combine sequence information with structural models to build the first large dataset of these specificity determining positions (SDPs). This dataset enabled the first large-scale analysis of sequence-based SDP prediction methods. We demonstrate that GroupSim, a new method we developed, outperforms existing approaches. Finally, we focus on the prediction of ligand binding sites when both evolutionary sequence information and structural models are available. We introduce ConCavity, a new algorithm which directly integrates sequence conservation information into structure-based surface pocket identification. This algorithm provides significant improvement over earlier methods and establishes the complementarity of sequence and structural evidence in ligand binding site prediction. Overall, our work significantly improves our ability to identify functional sites from protein sequences and structures.
Keywords/Search Tags:Protein, Functional sites, Sequence, Identification, Ligand binding, Algorithm, Prediction
Related items