Font Size: a A A

Research On Gene Set Enrichment Analysis Method And Correlation Between Gene And Disease

Posted on:2017-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2284330509957044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the development and the proliferation of high-throughput genome sequencing technology, the rate for producing gene data is increasingly accelerating. Scientists begin to concentrate themselves on gene function and gene diversity. The main challenges consist in how to efficiently interpret and analyze the enormous gene data to seek for the underlying regularity and reveal the unknown significance of human organism. Cancer, which is also called malignancy, is now the severest threat for human health. Using biotechnology, especially gene technology, is of great potential to become the breakthrough for cancer treatment in the future. Analyzing a group of genes which share the corresponding biological function(gene set) is increasingly becoming the mainstream method, which could be also called gene set enrichment analysis method.This dissertation proposed a topology-based model for gene set enrichment analysis. Such model considers a single gene as a reaction field, whose intensity could be quantified by topological potential. It not only depends on the gene expression value, but also depends on the correlation intensity between genes. The higher the gene expression value and correlation degree between genes are, the higher the intensity of the reaction field is. This dissertation classified conventional methods into two categories: gene expression based methods and gene co-expression based methods. As there are correlations between different genes, all the human genes could be organized as a transcriptional network. Based on such network, the gene expression based methods could also be called “node methods” and the gene co-expression based methods could also be called “edge methods”. The topology-based model could be viewed as the integration of “node methods” and “edge methods”. In terms of the experiment, this dissertation applied the topology-based model on three colorectal cancer datasets and compared it with four other state-of-the-art methods, the experimental results showed that topology-based model has better performance.This dissertation also introduced the human transcriptional regulatory network(HTRN) and designed the corresponding calculation algorithm. HTRN network is a part of the encyclopedia of DNA elements(ENCODE) project and could simplify the topology-based model. It makes the topology-based model just consider the correlation which exist in the HTRN network. As for the analysis of the topological model accompanied with gene transcriptional regulatory network, this dissertation applies a new analysis method which needs a lot of gene data. Such analysis method makes use of p-value and the rank of p-value for the target gene set to appraise the performances of different algorithms. According to the comparison between four methods, it is apparent that the topology-based model with gene transcriptional regulatory network has better performance than other methods. In other word, it is more competitive in terms of gene set enrichment analysis.
Keywords/Search Tags:high-throughput genome sequencing, gene set enrichment analysis, topology potential, gene transcriptional regulatory network
PDF Full Text Request
Related items