Font Size: a A A

Quality Control Studies Of Tandem Mass Spectrome Try-based Peptide Sequence Identification, Post-translational Modification Identification And Protein Assembly

Posted on:2012-02-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:N LiFull Text:PDF
GTID:1110330371962908Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
As a burgeoning scientific field in the post-genomic era, proteomics aims to study proteins in large scale, particularly their structures and functions. Mass Spectrometry (MS) combined with protein sequence database search allows high-throughput and rapid identification of proteome in different cells and tissues. One of the most used strategies is shotgun proteomics, which, however, undergoes low spectra identifica-tion rate (only 5%~30% of spectra have high-confident peptide identifications). In addition to the incomplete protein sequence database, noise (chemical contamination and electronic noise) and the bottleneck of peptide identification algorithm, the exis-tence of post-translational modification (PTM) is one of the important reasons.The commonly used search engines such as SEQUEST and MASCOT have a li-mited capability on PTM identification, for only a limited number of modifications could be considered. Discovery of a wide range of the known and unknown modifica-tions is still in a relatively primitive state. Unrestricted PTM search engines make it possible to identify unlimited or even unknown PTM types. However, quality control is still a major challenge in current data analysis.In order to obtain high-confident proteome identification, we implemented qual-ity control in three aspects respectively: peptide sequence identification, PTM and protein identification.Firstly, we presented a Percolator-based quality control tool named PepDistiller to facilitate the validation of MASCOT search results. With the inclusion of the num-ber of tryptic terminal (NTT), and the integration of the refined false discovery rate (FDR) calculation method, we demonstrated that the sensitivity of peptide identifica-tions obtained from the semi-tryptic search results is improved. Based on the analysis on a complex dataset, ~7% more peptide identifications were obtained using PepDis-tiller compared with using MASCOT Percolator. Moreover, the refined method gen-erated lower FDR estimations than the PIT-fixed method applied in Percolator. Using a standard dataset, we further demonstrated that the refined FDR estimations are more accurate than the PIT-fixed FDR estimations. PepDistiller is fast and convenient touse, and freely available for academic access at http://www.bprc.ac.cn/pepdistiller.Secondly, we designed a three-dimensional quality control strategy for unrestrictivePTM identification. On peptide sequence level, we developed pScore, using onlythree Minimum-Redundancy-Maximum-Relevance features to calculate the probabilityof modified peptides. pScore was demonstrated to have similar performance withthe scoring method of PTMFinder, a published unrestricted PTM quality control tool.On PTM type level, we calculated the correct probability ofâ–³M (PTM mass shift) byusing Gaussian mixture model and Bayesian theory. The method was highly sensitiveto abundant PTM types. On PTM site level, we applied Ascore, one of the most popularalgorithms for phosphorylation site assignment, to calibrate and locate PTM sites.Thirdly, we applied the half decimal place rule andâ–³M of spectral pairs to findthe abundant PTM types. With the half decimal place rule, potential PTM spectracould be separated out first, thereforeâ–³M could be extended to negative mass range.Abundant PTM types could then be identified by calculating the correct probability ofâ–³M. This strategy was demonstrated to be effective on a well-studied dataset, for inaddition to three uniquely identified negativeâ–³M, five out of seven positiveâ–³M withthe highest probability were also reported by other studies.Finally, we developed protein assembly tool named ProRazor to accurately inferproteins from high confident peptide identifications. ProRazor was demonstrated to bemore in line with parsimony principle and can improve the accuracy of protein identification.By using our simulated datasets, the effects of dataset size, database size andnumber of unique peptides on the FDR and false negative rate (FNR) of protein identificationwere also evaluated. The results indicated that a larger dataset will producehigher FDR and FNR, but database size has little influence on protein FDR and FNR.Filtering proteins by unique peptides does not help to improve the quality of proteinidentification if the peptides are all correctly identified.These methods mentioned above could improve the quality of protein expressionprofile and PTM profile, which are essential for the following biological analysis.
Keywords/Search Tags:Proteomics, Bioinformatics, PTM identification, Quality control
PDF Full Text Request
Related items