Font Size: a A A

Deciphering Gene Functions And Disease Related Pathways Based On GO

Posted on:2013-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1224330395974804Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Gene Ontology (GO) is the database of systematic description of genes and theirprotein products, which has widely been applied in deciphering the functional similarityof genes (and their products) and the disease related biological pathways based on thehigh-throughput biological data. However, the methods in these applications based onGO have great problems.The main contributions are as follows:1. Revealing and avoiding bias in semantic similarity scores for protein pairs.Semantic similarity scores for protein pairs are widely applied in functional genomicresearches. However, because some proteins, such as those related to diseases, tend tobe studied more intensively, annotations are likely to be biased, which may affectapplications based on semantic similarity measures. In our paper, we firstly evaluated14commonly used semantic similarity scores for protein pairs and demonstrated that theyhad a significantly positive correlation with the numbers of annotation terms for theproteins. These results suggested that current applications of the semantic similarityscores between proteins might be bias. Then, to reduce this annotation bias effect, weproposed normalizing the semantic similarity scores between proteins using the powertransformation of the scores. We provide evidence that this improves performance insome applications.2. Deriving the biologically relevant functions from statistically significantfunctions for a disease. In high-throughput studies of diseases, terms enriched withdisease-related genes based on GO are routinely found. However, most currentalgorithms used to find significant GO terms cannot handle the redundancy that resultsfrom the dependencies of GO terms. Simply based on some numerical considerations,current algorithms developed for reducing this redundancy may produce results that donot account for biologically interesting cases. In this paper, we present several rulesused to design a tool called GO-function for extracting biologically relevant terms fromstatistically significant GO terms for a disease. Using one gene expression profile forcolorectal cancer, we compared GO-function with four algorithms designed to treat redundancy. Then, we validate results obtained in this data set by GO-function usinganother independent data set for colorectal cancer. Our analysis showed thatGO-function can identify disease-related terms that are more statistically andbiologically meaningful than those found by the other four algorithms.3. Finding pairs of terms annotated with significantly more between-termco-mutated gene pairs. The extremely diverse and complex mutational landscape ofcancer has been further revealed by recent high-throughput tumor sequencing studies.Thus, to study the cancer genes and decipher the molecular mechanism of cancer, oneimportant task is to analyze high-throughput somatic mutation data of cancer genomesfrom the biological pathways. GO defines functions at various specific levels in ahierarchical manner. As such, it is reasonable to study the co-disrupted biologicalfunctions in carcinogenesis based on GO. In this paper, we developed an algorithm tofind non-redundant pairs of GO terms significantly overrepresented with between-termco-mutated gene pairs. Based on two somatic mutational screening dataset, we found78pairs of GO terms, respectively. These functional pairs include both general and specificfunctions, which can define the range of co-disrupted biological functions and providenew insight for understanding the mechanism of cancer.In conclusion, our proposed methods can efficiently solve the problems of theapplications based on GO and have great significance for correctly deciphering thefunctional similarity of genes and the disease related biological pathways.
Keywords/Search Tags:Gene Ontology, gene functions, bias, biological pathways, redundancy
PDF Full Text Request
Related items