Font Size: a A A

Statistical learning algorithms for protein inference and quantification in proteomics

Posted on:2012-02-27Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Li, YongFull Text:PDF
GTID:1450390008997764Subject:Chemistry
Abstract/Summary:
Proteomics is an emerging area in biology aiming at identifying and quantifying the complete set of proteins expressed in an organism under a specific condition. Relying on advanced analytical technologies, particularly mass spectrometry (MS), proteomics is experiencing increasingly higher sensitivity and throughput, and it has been applied to a variety of important problems ranging from disease diagnoses to genome annotations.;The focus of my dissertation is on protein identification and quantification in bottom-up proteomics. Bottom-up and top-down are two alternative strategies used for proteomics analysis. In a bottom-up proteomics experiment, proteins are cleaved into short pieces (peptides) which are further analyzed by mass spectrometers, resulting in thousands of mass spectra. These spectra are then computationally decoded back to a set of peptide sequences and peptide signal intensities, which are further used to infer the presences and quantities of proteins. Compared with the top-down approach, bottom-up proteomics is relatively well developed, and is currently the dominant strategy of proteomics research.;Despite the conceptual simplicity, four inherent issues exist in bottom up proteomics: the loss of protein level information, the under-sampling of the peptides, the biases in peptide detection, and the high (inter- and intra- experiment) noise level. To address these challenges, statistical learning and pattern recognition approaches are developed in this dissertation to capture the biases in shotgun proteomics experiments, and rigorous probabilistic models and inference algorithms are proposed to estimate the desired quantities. Four related aims are achieved in this dissertation. First, I have proposed a Bayesian network model that incorporates standard peptide detectability (detection bias) to infer the presence of proteins in biological samples. This is the first rigorous probabilistic approach for the protein inference problem. Second, I have proposed a modular artificial neural network model for the learning of non-standard peptide detectability from an arbitrary biological sample and MS platform. Third, I have developed a machine-learning model for peptide response rate (signal response bias) and for the first time used it for accurate label-free absolute protein quantification. Finally, I have combined proteomics and transcripteomics data to improve the genome annotation of the model organism Drosophila melanogaster.
Keywords/Search Tags:Proteomics, Protein, Quantification, Inference, Model
Related items