Font Size: a A A

Mining Tissue Specific Genes For Predicting Novel Drug Usage

Posted on:2012-02-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q YangFull Text:PDF
GTID:1114330368975495Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
With the evolution of high-throughput technologies in the post-genomics era, studies have shifted from characterization of single gene/protein to investigation of the entire biological system. There are an increasing number of studies focusing on Genome, Proteome, Interactome, Transcriptome and Diseasome, which provide an opportunity to understanding the nature of a system from the huge amout of biological data.Human tissues exhibit distinct characteristics in spite of differentiating from a common origin to fulfil the different needs of our body. This kind of diversity is contributed largely by the coordinated expression of different tissue-specific genes, in addition to other genes. The tissue-specific expression pattern of a gene implies not only its physiological function(s), but also where it plays roles in transcriptional regulation, development, stress-response and even disease etiology. Evidences gathered through mining tissue specificity, gene connectivity and disease association suggest that many disease-associated genes are likely to show specific expression in the tissues from which the diseases originate. Furthermore, several studies had utilized tissue specificity as an important factor when characterizing therapeutic/drug targets. Other areas for the use of tissue-specificity include, but not limited to, pathogenic mechanism, diagnosis, or therapeutic applications.A number of researches have been done to facilitate studies of TSGs. For example, BioGPS, TiGER, COXPRESdb and TiSGeD databases can be used to analyze human gene expression in various tissues. However, most of the above researches only focus on the specific expression patterns of TSGs whereas other important biological aspects are not much emphasized. It means if we want to study protein-disease, protein-function, protein-localization or drug-target association altogether, these above researches could not serve to our best interest alone. This really hinders the practical use of TSGs in medical research and the development of human systems biology. Therefore, a discovery method dedicated to linking and providing all the above information is highly desirable.This research is the result of a systematic effort to integrate tissue-specific genes surveyed across a large panel of normal human tissues with other important information including subcellular localization, functional annotation, disease/drug relation, biological pathways involvement and so forth. It could be used to generate testable hypotheses for basic and clinical research.Materials and MethodsAlthough there are several TSG sets identified from other independent studies, the respective coverage of sample and tissue number is often limited. This makes it harder to conclude whether or not such TSGs are truly expressed in a tissue-selective/specific pattern. In 2004 and 2006, Su et al and Liang et al independently generated a tissue-specific/selective mRNA expression matrix of thousands of genes across a large panel of biological samples (~4000 samples combined) and tissue types (~130 tissue types combined) from normal human subjects through microarray expression profiling analysis. Therefore, only these two datasets were selected for integration because of their extended coverage.Analytically, searching for tissue-specific genes amounts to comparing gene expression over many tissue types. To determine the tissue distribution for a given gene i across K tissue types, there exists P= K(K-1)/2 pair-wise comparisons for K tissue types. In our previous analyses, a modified Tukey-Kramer's honest significant difference (HSD) test with an Enrichment Score was proposed to overcome the typeⅠerror from multiple tests. Here Zij-scores were calculated to represent the relative level of a given TSGi expressed in one particular tissue j (j=1 to K) with regard to the mean expression of TSGi across all K tissues. Finally the product of Zij and ESj, denoted asτij, was computed to account for both tissue specificity/selectivity and relative expression level of a TSGj in a given tissue j. A largeτij specifies that a TSGi is highly specific and significant to a tissue j. In accordance with this quantitative index, not only genes specific to a tissue, but also tissues in which a gene selectively expressed could be ranked. Probe IDs were mapped to Entrez Gene IDs. Tissue names were carefully unified according to standard anatomical terms, and redundant tissue affiliations were merged according to the mean value ofτ.To elucidate the functional aspects of these TSGs, detailed annotations were collected. Features of each specific gene are available at six levels:disease association and targeting drugs, Gene Ontology annotation, subcellular localization, biological pathways and mammalian phenotype linkage.Data sources are as follows:Gene-disease relationships were gathered from Gene2MeSH, OMIM information and Swiss-Prot. Non-standard disease names were associated with MeSH IDs and mapped to the MeSH tree categories. Gene-targeting drug relationships were obtained from DrugBank. Subcellular localization information for these tissue-specific genes was retrieved from LOCATE, supplemented with cellular component annotation of the Gene Ontology (GO) database. Molecular function and Biological process were also obtained from GO. The pathway and reaction information came from KEGG and Reactome respectively. Mammalian Phenotype information derived from MGI (Mouse Genome Informatics).ResultsFinally,3,960 tissue-specific genes were identified through expression profiling of a panel of 127 human tissue and cell types. These TSGs express selectively in~2 tissues on average.By integrating these data,5,684 gene-disease relationship and 2,148 gene-drug relationship have been collected. Meanwhile 40,058 gene-go annotations,3,687 gene-subcellular location records,32,397 mammalian phenotype notes and 6,359 gene-pathway associations were curated. With 10,102 tissue distribution information records, these tissue specific genes were well annotated.All these data was constructed by MySQL and were implemented in PHP/SQL through an intuitive MySQL interface. PHP and Javascript were applied to perform the Fisher Exact Test and Bejamini-Hochberg correction. The data contents were configured into two basic queries:Tissue Specific Gene search and Tissue distribution of TSG to allow users to conveniently retrieve information relevant to a single gene and tissue/subcellular localization of interest respectively. Of particular note, Batch query, which evaluates the enrichment of tissue specificity, subcellular localization, pathway, Gene Ontology, phenotype, disease or drug for many genes in a single query, is also performed to analyze or to rank genes of interest. It is useful to find hidden links and to generate hypotheses. Integrated query is also developed to conduct richer combinatorial searches meeting several biological characteristics simultaneously.The TSGs closely related to a specific disease could have hidden links to other biomarkers or therapeutic targets/agents. Our database allows us to identify these unexpected links in order to generate new hypotheses. In the following example,8 TSGs for periodontitis (and 1 for aggressive periodontitis) could be found. Batch query shows that 5 genes among them are also related to rheumatoid arthritis and are specific to immunologic tissues. These 5 TSGs are enriched in such biological processes as immune response and inflammatory response. In addition, they share some common biological pathways, such as cytokine-cytokine receptor interaction and toll-like receptor signalling pathway for the two diseases. Indeed, these findings are consistent with emerging evidence of periodontitis and rheumatoid arthritis sharing many pathological features and biological links. Batch query result also suggests that certain TNF inhibitors (e.g. Etanercept and Adalimumab) suitable for one medical condition might be useful for another. For instance, recent studies showed that periodontal therapy using these inhibitors reduced the severity of active rheumatoid arthritis in patients.It is well known that drug development is time-consuming and very expensive. Finding new indications of existing drugs may help to capitalize the use of such drugs to remedy other medical conditions. Another example presented here is regarding Simvastatin, which is a hypolipidemic drug used to control hypercholesterolemia and to prevent cardiovascular disease. Multiple-condition-query indicates that Simvastatin targets 10 TSGs, three of which are significantly linked to Endometriosis (p< 0.050). The curative effect to this autoimmune disease could be inferred. This prediction was preliminarily verified by test in nude mouse model. The above examples clearly demonstrate the power of our work to reveal the hidden links some of the earlier databases failed to capture. Meanwhile, simvastatin's curative effect to short QT Syndrome 1 was also predicted. Many questions such as "How many pathways are enriched in tissue A and what are they? Are they disease-specific? What are the mitochondrial proteins involved in apoptosis in tissue X? Is leukemia linked to any neural disorder? What are the drugs targeting pathway Y?" and so forth can thus be addressed similarly.ConclusionsOur research constructs a dataset for tissue specific gene and opens a new way for drug effect discovery. We have integrated rich information associated with human TSGs from multiple sources in a standalone form to reveal many hidden links beyond tissue-specificity. This makes it a potentially useful source for many applications:for instance, screening for therapeutic targets or biomarkers by tissue, subcellular localization or gene-drug relationship, or looking up for functional enrichment of similarly localized genes or genes participates in a common pathway/disease or vice versa. And most importantly, some hypotheses for pathogenic mechanism, diagnosis and therapeutic researches, could be inferred based on the biological links of TSGs. Much of our effort will be geared toward the understanding of how TSGs play their roles in development, differentiation, stress response and pathology. Study on tissue-specific transcriptional regulation is under way. We also expect to generate many testable hypotheses to maximize our research's practical value.
Keywords/Search Tags:tissue specific gene, microarray analysis, systems biology, bioinformatics
PDF Full Text Request
Related items