Font Size: a A A

Computational prediction of subcellular localization of proteins

Posted on:2006-11-05Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Nair, RajeshFull Text:PDF
GTID:2450390008961617Subject:Biology
Abstract/Summary:
The genetic information for life is stored in the nucleic acids (i.e. DNA) while proteins are the workhorses that are responsible for transforming this information into physical reality. Proteins are the macromolecules that perform most important tasks in organisms, such as the catalysis of biochemical reactions, transport of nutrients, recognition and transmission of signals. Experimental determination of the function of a protein is a complex and laborious task requiring several months and the dedicated efforts of an entire lab. Due to large-scale sequencing projects, we currently know the genome (DNA) sequences of over 100 organisms. This translates to nearly a million protein sequences. However, some kind of experimental annotation of function is available for only around 25,000 proteins. In this scenario development of computational approaches for function prediction are of vital importance. The subcellular localization of a protein is one important aspect of its function. Biological cells are subdivided into membrane bound subcellular compartments. After synthesis in the cytosol proteins are sorted into different subcellular compartments by a complex trafficking mechanism. Proteins must be localized in the same subcellular compartment to cooperate towards a common physiological function. In contrast to the other functional roles of a protein, the protein trafficking mechanism is relatively well understood, and computer-readable subcellular localization data are available for a large number of proteins.; We have developed the most accurate set of tools currently available for predicting the subcellular localization of a protein. Since we do not clearly understand the trafficking mechanisms employed within the cell we use a combination of four different approaches for predicting localization; (a) PredictNLS: identification of nuclear localization signals (NLSs). NLSs are short sequence motifs, which are responsible for targeting proteins to the nucleus. Experimental NLSs account for fewer than 10% of known nuclear proteins. By using a technique of 'in-silico mutagenesis' to discover new NLSs we have been able to extend the coverage to over 40% of known nuclear proteins. Predictions made using PredictNLS have already been confirmed in a number of experiments and the tool is widely used for discovering NLSs in unknown proteins. (b) LOCkey: inferring localization by automatic mining of experimental data. Using a novel data-mining algorithm, LOCkey infers localization by minimizing an entropy-based objective function. LOCkey was the first fully automated tool for predicting localization by mining keyword annotations in databases. (c) LOChom: Evolutionary relationships between proteins manifest in the similarity of their amino acid sequences. By studying the relationship between sequence similarity of a pair of proteins and the conservation of their subcellular localization, we were able to establish a number of interesting results including the existence of sharp sequence similarity thresholds for the conservation of subcellular localization. LOChom is the only publicly available tool for inferring localization based on sequence homology. (d) LOCToGo: By employing advanced artificial intelligence techniques like neural networks and support vector machines, we have developed the most accurate tools to-date for ab-initio prediction of subcellular localization based solely on the amino acid sequence and features predicted from the sequence. We believe the tools we have developed will have a significant impact on our understanding of protein function.
Keywords/Search Tags:Protein, Subcellular localization, Function, Sequence, Prediction
Related items