Algorithms for the identification of functional sites in proteins

Posted on:2010-01-05

Degree:Ph.D

Type:Dissertation

University:Princeton University

Candidate:Capra, John Anthony

Full Text:PDF

GTID:1440390002476923

Subject:Biology

Abstract/Summary:

Proteins play an essential role in nearly every process carried out by the cell. In accomplishing this incredibly diverse array of functions, proteins interact with one another and other molecules in their environments. The interactions of proteins with other molecules are mediated by specific amino acids. For a given protein, the identification of the residues that participate in its interactions can be a crucial step in understanding its function. Knowledge of these so-called functional sites can guide further experimental analysis of the protein and aid drug design and development. The large number of protein sequences and structural models that have become available over the past 10 years present an exceptional opportunity to use the methods of computer science and statistics to identify protein functional sites, and thereby further biological understanding.;This dissertation investigates the computational prediction of functional sites from protein sequence and structure data. First, we consider the estimation of evolutionary sequence conservation from a multiple sequence alignment of homologous proteins---a common first step in the identification of functionally important sites. We introduce a fast, information theoretic algorithm for scoring conservation and demonstrate that it provides state-of-the-art performance in predicting catalytic sites, ligand binding sites, and protein-protein interface residues. Second, we examine the identification of a class of functional residues that cannot be identified by considering sequence conservation alone: those that determine functional substrate specificity within homologous protein families. We combine sequence information with structural models to build the first large dataset of these specificity determining positions (SDPs). This dataset enabled the first large-scale analysis of sequence-based SDP prediction methods. We demonstrate that GroupSim, a new method we developed, outperforms existing approaches. Finally, we focus on the prediction of ligand binding sites when both evolutionary sequence information and structural models are available. We introduce ConCavity, a new algorithm which directly integrates sequence conservation information into structure-based surface pocket identification. This algorithm provides significant improvement over earlier methods and establishes the complementarity of sequence and structural evidence in ligand binding site prediction. Overall, our work significantly improves our ability to identify functional sites from protein sequences and structures.

Keywords/Search Tags:

Protein, Functional sites, Sequence, Identification, Ligand binding, Algorithm, Prediction

Related items

1	Research On Protein-ligand Binding Sites Prediction Based On Sequence Information
2	Identification Of Protein-metal Ion Ligand Binding Sites Based On Deep Learning Algorithm
3	Developing New Algorithms For Protein-ligand Binding Sites Prediction
4	Feature Extraction And Learning Algorithm For Protein-ligand Binding Sites Prediction
5	Prediction Methods Of Functional Sites In Protein
6	Research On Protein-protein Binding Sites Prediction Method Based On Sequence Information
7	Identification of protein-ligand binding sites by top-down mass spectrometry
8	Identifying Ion Ligand Binding Sites With The Energy,Physicochemical And Structural Features
9	Computational Researches On Sequence-Based Transmembrane Protein-Ligand Binding
10	Recognition Of Ligand-binding Sites In Proteins Based On Deep Learning