Font Size: a A A

LiverAtlas Build Predictive Model Based On Systems Biology Of Hepatocellular Carcinoma And Liver Research Strategy Integrated Knowledge Base

Posted on:2013-10-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:1264330431472814Subject:Medical Genetics
Abstract/Summary:PDF Full Text Request
The liver is the largest gland and a vital solid organ in the body. It plays a wide range of roles that impact all body systems:(I) secretory and excretory functions, particularly with respect to the synthesis of secretion of bile;(Ⅱ) metabolic achievements in control of synthesis and utilization of carbohydrates, lipids and proteins;(Ⅲ) detoxification;(Ⅳ)haematogenesis and thrombosis;(Ⅴ) production of biochemicals necessary for digestion. As the development of the "omics" research and the implementation of "Human Liver Proteome Project (HLPP)", a large number of liver related physiological and pathological data have been generated. These "omics" data in publicly available biological and bibliographic databases are usually far from comprehensive and not integrated, as most of these data are in raw format. Such data collection, integration, and mining processes pose great challenges to both scientific researchers and clinicians interested in the liver.Hepatocellular carcinoma (HCC) is one of the most common malignant tumors with an increasing incidence worldwide. The resistance of HCC to existing treatments and the lack of biomarkers for early detection make it one of the most hard-to-treat cancers. Given the importance of early-stage diagnosis to the application of curative treatments which are the only hope for increasing the life expectancy of patients with HCC, the development of effective systems which can predict the occurrence of this neoplasm is much needed.To address these deficits, we focused on the computional systems biological methods for liver-related reseach. At first, we constructed a novel and effective systems biology-based HCC classifier; then, we constructed a unique, curated, integrated and web-based database of biomedical knowledge of liver and hepatic disease by collecting and integrating the existing liver-related biological data. The overall objective of this study was to establish the data and technology basis for liver-related systems biology research. Part OneA systems biology-based classifier for hepatocellular carcinoma diagnosis [AIM]To develop a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis.[METHODS](1) Three publicly available datasets of gene expression profiles of non-tumor liver tissues versus HCC tissues were collected from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/);(2) The differentially expressed genes identified throughout the three microarray gene expression datasets using the cancer microarray platform Oncomine (https://www.oncomine.org) were selected as candidate genes for further network analysis;(3) The network of above candidate genes was generated using GeneGO Meta-Core software;(4) Only direct connections between the identified genes were considered. Major hubs were defined as those with more than30connections and<50%of edges hidden within the network. The hub genes were selected as the components of HCC classifier, which was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes;(5) The overall performance of HCC classifier was evaluated by two distinct approaches:five-fold cross-validation test and independent dataset test;(6) The prostate cancer diagnosis classifier was constructed according to the same protocol mentioned above;(7) The clinical significance of MAPK1and NCOA2proteins in HCC was investigated by immunohistochemistry assay using30matched HCC and paracarcinomatous liver tissue specimens.[RESULTS](1) Data mining of three microarray datasets from the Oncomine platform for genes differentially expressed in HCC tissues compared with their expression in non-tumor liver tissues led to the identification of116upregulated and111downregulated genes, which were selected as candidate genes for further network analysis;(2) To create the network, the genes (nodes) and published literature-based connections (edges) were plotted using GeneGo-MetaCore. Seventeen hub genes (10was upregulated and7was downregulated in HCC) with more than30connections and less than50%of edges hidden within the network were identified as the components of HCC classifier;(3) The overall predictive accuracy of HCC classifiers on different independent test datasets were more than85.00%and the areas under Receiver operating characteristic curve were more than0.90;(4) The resutls of5-fold cross-validation shown that the HCC classifier had areas under Receiver operating characteristic curve approximating1.0in all the five tests, suggesting it has a great reliability and efficacy to identify the true HCC tissues against different test datasets;(5) The network topological features integrated into this classifier contribute greatly to improving the predictive performance (p<0.05);(6) The overall predictive accuracy and the area under Receiver operating characteristic curve of the prostate cancer classifier constructed with the same protocol of the HCC classifier were respectively84.79±6.53%and0.82+0.10;(7) The expression levels of MAPK1and NCOA2proteins in HCC tissues were both significantly higher than those in paracarcinomatous liver tissues (both p<0.05). In addition, the expression patterns of MAPK1and NCOA2were significantly correlated with the differentiation degree (p=0.03) and Edmondson-Steiner grade (p=0.04) of HCC tissues, respectively.[CONCLUSION]The analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier. Part twoLiverAtlas:A unique integrated knowledge database for systems-level research of liver and hepatic disease [AIM]To construct the LiverAtlas (http://liveratlas.hupo.org.cn), a unique, curated, integrated, and web-based database of biomedical knowledge of liver and hepatic disease.[METHODS](1) Designing the schema of the LiverAtlas database;(2) Fifty-three databases which are publicly available have been mined or cross-linked and integrated to create the LiverAtlas database;(3) The LiverAtlas database integrated data from multiple sources. For genes and proteins, LiverAtlas uses official gene symbols, ID and names from the NCBI Entrez-Gene database, and protein names, ID and accession from UniProtKB-Swiss-Prot/TrEmbL. Data parsing and extraction from the source databases are performed using PERL scripts;(4) In order to facilitate users to select data of their interests, the liver-expressing genes and proteins, PPIs, and PTMs in the LiverAtlas database are assigned quality scores (QS) calculated by a semi-quantitative assessment, which considers both the reliability and the percentage of data sources at each reliability;(5) Database construction;(6) Further data mining of the annotation information in the LiverAtlas database in order to investigate the liver physiology and pathology;(7) Application example:identification of the candidate biomarkers for HCC using data in the LiverAtlas database.[RESULTS](1) The LiverAtlas database integrated three major curated sources:(Ⅰ) Experimental results on healthy human liver proteomics obtained from a collaborative effort of the HLPP;(Ⅱ) Liver related information from external biomedical databases which are derived from highly focused biochemical studies, high-throughput experiments or prediction;(Ⅲ) Liver (specific) expression and HCC related proteins from scientific literatures. Users can browse, search and further mine the data stored in LiverAtlas, and also can retrieve the detail information from various source databases through external links.(2) The LiverAtlas database contains:(Ⅰ)19,801genes involved in a number of liver molecular and genetic events;(Ⅱ)50,265proteins involved in liver development and diseases from existing proteomic databases and experimental validation results of the HLPP;(Ⅲ)14transcriptomic datasets on gene expression profiles of different hepatic diseases;(Ⅳ)639signaling or metabolic pathways in which liver-related genes and proteins are involved;(Ⅴ)59types of hepatic diseases including basic information, bibliographic sources, and molecular and genetic events.(3) The percentage of data with medium~high reliability in the LiverAtlas database was nearly98%.(4) Nine proteins were identified as candidate biomarkers for HCC diagnosis by analyzing the interactions with known cancer-related genes and the network topological features of differentially expressed proteins in this tumor using the LiverAtlas information.[CONCLUSION]LiverAtlas is the most comprehensive liver and hepatic disease resource, which facilitates scientists and clinicians without a computational background to analyze their data at the systems-level and will contribute greatly to biomarker discovery, drug target selection and diagnostic performance enhancement for liver diseases.
Keywords/Search Tags:hepatocellular carcinoma, gene expression, interaction network, biomarker, predictive modelomics, database, liver physiology, liver pathology, biomarkerdiscovery, drug target
PDF Full Text Request
Related items