Font Size: a A A

Establishment Of The Platform For Organelle Protein Profiling With Data Mining And Application In Human Liver Nuclear Proteome Research

Posted on:2009-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W HaoFull Text:PDF
GTID:1100360245958689Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Cell organelles play important role in cellular process. Although these cell compartments have been studied for a long time, their compositions are still unclear to some degree. Proteomics provide a powerful tool to give a survey of the proteins in a cell or tissue. We could get more information about organelles from the protein compositions. Organelle proteomics is an important part of proteome research and is also a crucial part of Human Liver Proteome Project. To gain a suitable strategy for human liver proteome research, C57 mouse is chosen as model for sample preparation and data analysis.The subcellular proteomics is an important part of proteomics research. And the investigation of organelles always focuses on one compartment, which may loose the integrity information of cells. In addition, the separated cell organelles always contain cross-contaminant by other compartments because different organelles connect with each other in nature, which lead to the wrong subcellular classification of identified proteins. To address this issue, we designed an experiment to study the protein compostion and localization with proteomics and bioinformatics tools. We carried out the experiment in two ways to explore an accurate quantitation strategy and guarantee the comparability among organelles. Firstly, a subucelluar separation method was employed, which could separate plasma membranes, mitochondria, nuclei and cytoplasm from the same homogenate. And the western blot and electron microscope observation showed the satisfying purity and integrity of the organelles. Secondly, we evaluated accuracy of quantitaion method in the protein separation and identification by a mixed protein samples. According to the quantitation result, we gained a confident protein identification strategy and data processing method. At last, we identified 3,189 proteins, in which the false positive was controlled at 1%.The protein subcelluar classification was dependent on the quantitaion result. To give a comprehensive analysis of our strategy, a step by step method was introduced to evaluate the protein localization. At first, we used cluster algorithm to classify the data by the calibrated spectral count. Secondly, we found that the quantitation result was more accurate if the spectral count was no less than 2. Finally, k nearest neighbor algorithm was employed to give an evaluation of protein localization by quantitation result and golden standard. By the three criteria above, 2,740 proteins were localized by our strategy with 120 new proteins localization, which is the largest subcelluar protein localization data for mouse liver.The aim of proteome research is not only to provide a reference map, but also to give request to data mining to form new knowledge. Here we used Gene Ontology (GO) and protein interaction data to annotate the subcellular protein data. The distribution of GO terms in our data show that the primary metabolism and enzyme activity are the largest, which represent characteric of liver physiology. Because the protein interactions strongly suggest the function among proteins, we exploit this information to mine the potential function of these proteins. The mouse protein interactions were constructed from the model organism by ortholog comparison. And we obtained 10,274 protein interactions of 3,757 proteins. After this, the subcellular proteins were put in the network as seeds to get the sub-network. MCODE algorithm was used to analyze the organelle related sub-network and 25 protein complexes were found from the data. The function of these protein complexes involved in protein degradation, mRNA splicing, ribosome assembly and signal transduction etc. With literature annotation, we found some new members in the 26S proteasome, mRNA spliceosome, mitochondrial 18S ribosome, actin related 2/3 protein complexes and give a clue to the potential function of mitochondrial 18S ribosome in metabolism. Moreover, we found Cirhin in ribosome assembly and it is a disease gene whose function is not clear now. Our result implicated potential pathogenesis. At last, we found some unknown complexes in the data, which need further experiment confirmation.Based on the experience of C57 mouse study, the technology was implied in human liver samples and established the human liver nuclear proteome. 2,025 proteins were given the localization to nuclei with an improved algorithm that mixed KNN with other information by Bayes model. In the function analysis, the biological characterization was described systematically in nuclei, in which many proteins involve in signal transduction, translation initiation and protein degradation. And lots of new proteins were localized by our method and some other proteins had a new cellular localization, which expand the knowledge about the liver proteins. In addition, we analyzed protein evolution in proteome scale and found positive correlation between protein quantity and evolutionary conservation.In conclusion, we shed light on the organelle proteomics through subcellular isolation and label free quantity method. More importantly, a strict evaluation to protein localization is established with machine learning algorithm, which expands the knowledge of organellar proteins. In protein function analysis, we try a protein complexes strategy to study the unknown proteins and gain the useful information. This platform is a pre-trial for human liver organelle proteome, which is also worthy to other "omics" research.
Keywords/Search Tags:organelles, label free quantitation, cluster, machine learning, data mining
PDF Full Text Request
Related items