Font Size: a A A

Based On Biological Mass Spectrometry Proteomics Data Processing And Retrieval Of Quality Control

Posted on:2008-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:D YuanFull Text:PDF
GTID:1110360272959779Subject:Chemical Biology
Abstract/Summary:PDF Full Text Request
The main contributions of this dissertation are as follows. 1. The templates construction and application of proteomics standards for data management and exchange. 2. Development of an optimized search strategy named 'Iterative Non-m/z-sharing Rule' for confident and sensitive protein identification of 2-DE based Proteomics. 3. Spectral quality assessment and application for gel based MALDI-TOF/TOF MS data interpretation. 4. Establishment of four reference datasets for human liver proteome analysis. 5. Comprehensive proteome analysis of mouse liver by ampholyte-free liquid-phase isoelectric focusing.As the nature of life, proteins act as executants of complex biological processes, and then are more close to the core of biological systems than genes. In 1994, Dr. Williams presented the original conception of proteomics, and Dr. Willkins defined the term "proteome". In the past two decades, technical thrusts have turned high-throughput proteome analysis into reality. The development of proteomics relies mainly on the improvement of ESI-MS and MALDI-MS technology, and also sprang up the proteomic-orientated bioinformatics. Now, proteomics has burst onto the scientific scene as an important objective crossing over life science, chemistry and informatics. MS-based proteomics have been the main force of large scale protein identification instead of classic Edman sequencing method. However, proteomics is still hindered by the deficiencies of separation and detection platforms. Although lots of efforts have been dedicated to solve the problem, the false positive and false negative results are still inevitable. As expected, its solution lies on progress or/and brekthrough from both analytic sciences (wet) and informatics (dry). Corresponding algorithm evaluations and developments are continuously progressive, but still remain largely not to be addressed. Herein, based on the proteomics platforms of Fudan University, a series of efforts were made to resolve several key problems of comprehensive proteome data analysis which were described briefly as follows:Firstly, current status of proteome data processing methods was reviewed including the introduction of search engines, progress and challenges of proteome data analysis, and important research objectives as well.Secondly, the process of templates construction and application of proteomics standards was introduced in detail. According to the principles from HUPO PSI (Proteomics Standards Initiative), the templates construction was performed by parameter extraction, minimization of parameter requirement, test of templates draft and application. The templates covered the most important proteomics platforms and have been successfully applied to data management and exchange of Human Liver Proteome Project (HLPP).Thirdly, a systematic search strategy named Iterative Non-m/z-Sharing (INMZS) analysis was proposed to address the problem. Actually, lots of pseudo-matches of 2-DE-based proteomic data are caused by over-used sharing m/z. Therefore, our strategy focused primarily on the validation of matched m/z. It utilized decimal rule and frequency threshold to filter the noise signal in the PMF and corresponding PFF peak-lists. Then search results were screened based on share status of corresponding matched m/z. Only the proteins that were matched with exclusive m/z information would be reserved as final results. Further iterative search was applied to improve discovery of minor components in a spot. Finally, identifications were all confirmed by reverse database evaluation. Simulation and application test of INMZS were implemented on large datasets of human liver proteome and standard protein cocktails. These results showed that INMZS was efficient to ensure the confidence and sensitivity of 2-DE based protein identification.Fourthly, a multi-variant regression approach was utilized to assess spectral quality for both PMF and PFF spectra obtained from MALDI TOF/TOF mass spectrometry. Then the assessed index was applied to investigations of MASCOT search results. After analyzing different search modes of MASCOT, a validation method based on score difference between normal and reference (reverse or random) database searching was proposed to define the positive matches. Systematic examinations on two large scale datasets of human liver tissues proved that spectral quality was a key factor for successful matching. Further analysis showed that spectral quality assessment was also efficient in representing the quality of 2-DE gel spot and promoting the discovery of potential post-translation modifications.Fifthly, to construct comprehensive and reliable reference datasets, manually searching and analyzing were implemented basing on NCBI PubMed search engine. Liver disease related dataset was collected from OMIM and Genecards with strict quality control. Four reference datasets were constructed: Integrated Liver tissue Proteome (ILP), Human Heart Proteome (HHP), Human Plasma Proteome (HPP) and Liver Disease Genes and Proteins (LDGP). The overlaps between the constructed datasets and Human Liver Proteome (HLP) are all considerable, indicating the remarkable similarity or/and protein exchanges between liver and plasma and other tissues. After annotated by HLP semi-quantitative information, lots of HLP proteins trend to be expressed at low, extra low or trace abundance in liver. Such abundance distribution suggested that HLP presented a comprehensive protein profiling of liver tissue.Sixthly, the ampholyte-free liquid-phase isoelectric focusing (LIEF) was combined with narrow pH range 2-DE and SDS-PAGE HPLC for comprehensive analysis of mouse liver proteome. As LIEF-prefractionation could greatly reduce complexity of sample and enhance loading capacity of IEF strips, the number of visible protein spots on subsequent 2-DE gels was significantly increased, facilitating discovery of low-abundant proteins. Totally, 6271 protein spots were detected after LIEF-prefractionation and integrating five narrow pH rang 2-DE gels from pH 3~11. Furthermore, LIEF fraction of pH 3~5 and unfractionated sample were separated by pH 3~6 2-DE respectively and identified by MALDI-TOF/TOF. Synchronously, LIEF fraction of pH 3~5 was also analyzed with SDS-PAGE RP-HPLC MS/MS strategy. More proteins with low abundance, or/and with extremely physicochemical characteristics were identified in comparison to the conventional 2-DE method. The combination of LIEF hyphened 2-DE and LC strategies is also effective to promote the identification of new proteins and investigations on post-translational modifications of mouse liver proteins.
Keywords/Search Tags:Proteomics, Bioinformatics, Database matching, Quality control, spectral quality assessment
PDF Full Text Request
Related items