Font Size: a A A

Study On Glycoprotein Proteomics Based On Biological Mass Spectrometry

Posted on:2013-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1100330434971302Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The thesis had two main contributions:1) Built up a high throughput mass spectrometry platform for glycopeptides identification;2) Proposed a method to combine transcriptome and proteome data together for selecting biomarkers related to hepatocellular carcinoma metastasis. Part1of the thesis was about the identification of glycopeptides, with chapter2,3,4and5in it which were composed to be the emphasis of the thesis. Part1was aimed at analysis the spectra of glycopeptides, and designed a software GRIP to characterize glycopeptides in protein mixtures. Part2, was mainly to analysis combined data of transcriptome and proteome to uncover potential biomarkers related to hepatocellular carcinoma metastasis in which chapter6was involved.Protein post-translational modifications nowadays became a hotspot of proteomics research, while glycosylation was one of the most important modifications. Glycoproteins are widely distributed in plasma membrane of variety tissues and body fluid. More than50%proteins in mammalian bodies are glycosylated. The variation of glycan and glycan chain structures are involved in the process of many serious disease, such as carcinoma, metabolism diseases, vascular diseases and congenital genetic disease. Because of the importance of glycoproteins, lots of related researches and technologies were developed, mainly to study the amount, variety, glycosylation sites, as well as the structure, order, topology and3-D structure of the glycosylation chain with increasing difficulties for analysis. Until now, none of mature software under study of proteomics could be directly applied to glycoproteomics analysis, while the software developed for glycoproteomics analysis are in the original stage with low throughput and high false positive. Thus, to develop a method based on spectra generated by mass spectrometry to identify glycopeptides with high throughput and high confidence is now in need.Different types of mass spectrometry perform different fragmentation pattern for glycopeptides, thus generate spectra with huge variation. Fully understand different fragmentation patterns do help for correctly distinguishing glycopeptide spectra. For glycopeptides analysis, both glycol-chains and peptide backbones are fragmented in high energy collision induced dissociation (HCD) mode, while only glycol-chains are fragmented in low energy collision induced dissociation (CID) mode. The phenomena exist in both LTQ-ORBITRAP and QIT mass spectrometry. In low energy CID mode, a serial of neutral loss peaks are generated in glycopeptides spectra, and the critical issue is capture of these peaks for identification of glycopeptides. Based on the spectra acquired in low energy CID mode, we can extract neutral loss peaks using graph theory and further developed related methods for identification of N-glycopeptides in real complex samples. We tested the method both in standard glycoproteins and real complex samples, and verified that the method was with extremely high accuracy. Then we applied the method to analyze human serum samples and finally identified745N-glycopeptides.The importance of liver cancer research itself goes without saying; many studies on it were launched already. We have built a mouse model with metastasis of hepatocellular carcinoma previously, and its expression data on transcriptome and proteome level were gained by microarray and mass spectrometry methods. Many researches on differentially expressed gene selection are based on principle of the intersection of both transcriptome and proteome levels biomarkers. However, these methods not only lack the relative locations of the differential expressed genes on a global level, but also cannot make full use of the strength of the combination of data from transcriptome and proteome for selection of biomarkers. Comparatively, the method we proposed to project data of transcriptome and proteome to a two-dimensional plane, and using confidence ellipse or confidence interval to distinguish genes with the same differential express trends for the biomarker selection, is much more scientific.Chapter2discussed the protein glycosylation sites in human liver tissue. We for the first time combined two non-lectin methods, hydrazide chemistry and hydrophilic affinity methods, to enrich glycopeptides in human liver tissue, high accuracy mass spectrometry LTQ-ORBITRAP was applied to get the spectrums in both CID and ETD modes, searched the data using MaxQuant with a strict threshold of FDR<=1%, we finally identified1700N-glycosylation sites. Compared with different data sets, we found out that our dataset was a large scale complement of the former datasets. Most sequence of N-glycosylation met the N-X-[S|T] motif and some new motifs showed up. Meanwhile, N-glycosylation sites are much more frequently exist on β-sheet than on a-helix.Chapter3was focus on the design, write, core algorithm and test of GRIP which was used for glycopeptide analysis. We tested spectra of peptides of ASF and HRP generated by QIT and LTQ-ORBITRAP, almost all the glycopeptides were identified correctly indicating that GRIP is powerful to recognize the glycopeptides in a simple protein mixtures. Meanwhile, we also used other standard proteins to verify that GRIP is superior to existing software to analyze glycopeptide spectra.Chapter4was mainly about the application of GRIP in analyzing biological samples. The specified experimental methods we designed, was first construct an de-glycosylation peptide database of real samples via an pre-experiment, and then combined with information from references and Glycoworkbench to construct a composition of N-glycan database(365types), thus formed a theoretical N-glycopeptide database. Meanwhile, the real spectrum from mass spectrometry was used to build a random spectrum which is used to be the threshold criterion of the GRIP. We verified the feasibility of the program by testing a standard glycoprotein ASF, and high accuracy of the program was also proved by large scale test of spectra of the same precursor ions in HCD mode to verify the results of GRIP analysis. All the tests and results showed that GRIP is capable to identify the glycopeptides of real and complex samples. In the analysis of human serum samples, we identified745glycopeptides most of which have sialic acid and fucose, and the most abundant among them is immunoglobulin consistent with previous research.Chapter5expanded the application of GRIP, which we aimed to analysis the topology of glycopeptides in biological samples. The former GRIP method is to get the information of the composition of glycopeptides without uncovering the topological structure of them, so we exploited a new system to analysis N-glycosylation topological structure. To build fragments of glycopeptides database, the chief problem is how to form a database of glycan fragments. As Glycoworkbench could not imitate all the N-glycan fragments, we applied a homemade method to construct N-glycan fragments. We applied matrix methods to predict the substructure based on known common structure of N-glycan in serum and finally10004N-glycan structures were obtained. Fragment database of each glycan structure were built by cycling the matrix methods. Finally, we designed5scoring formulas of glycopeptide-spectrum matching and tested in artificial spectra and real spectra. The result showed that we can not only get the topological structures of glycopeptides, but also distinguish isomers by using databases of fragments of glycopeptides.Chapter6was about the selection of potential biomarkers related to hepatocellular carcinoma metastasis. In this chapter, data from both transcriptome and proteome were combined together, and projected to a2D plane, and then confidential ellipse and confidential interval methods were used to screen those genes with a same direction of differentiation expression. This method seemed to be much more scientific. When testing mouse model with metastasis of hepatocellular carcinoma, we also found confidential interval method is stricter than confidential ellipse method.Through the study, we established a resolution for high throughput identification of N-glycopeptides in biological samples, which would lay a foundation for further research for screening and selecting glycopeptides as biomarkers of diseases. In addition, what was introduced in chapter5begin the development of methods for further studying the topological structure of N-glycopeptides. We will further our work in chapter6by exploring biomarkers of hepatocellular carcinoma metastasis as well as related selection methods.
Keywords/Search Tags:Standard N-Glycoprotein, Serum N-Glycoprotein, N-Glycopeptide, N-Glycosylation Site, N-Glycan Structure, GRIP, Algorithm, Random Spectrum, Biomarker, Hepatocellular Carcinoma Metastasis
PDF Full Text Request
Related items