Font Size: a A A

Identification Of Driver Genes And Pathways In Cancer With Omics Data Based On High-throughput Sequencing

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2284330485964067Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Nowadays, cancer is considered to be one of the most high-mortality diseases to inflict human beings. Researchers should comprehensively understand the molecular mechanisms of cancer prior to exploring the clinical diagnostics and targeted therapeutics of cancer. New-generation high-throughput technologies, including next-generation sequencing technology and mass spectrometry methods, have been extensively applied to solve biological problems, especially in human diseases field, such as several national and international cancer genome projects, the International Cancer Genome Consortium(ICGC) and the Cancer Genome Atlas (TCGA), are ongoing, aiming to complete cancer genome sequencing. These large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. However the identification of mutated driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones from these data is remaining a significant challenge. Some important gene mutations in cancer progression have been reported with significantly higher mutation rates across samples than the background mutation rates in a vast number of samples. Researchers have also realized it is necessary to shift the point of view from the genomic data to the combination of multiple omics data. For example, At trancriptomics level, for instance, small RNA sequencing can be used=to detect known and predict unknown miRNA, which could not only treated as the biomarkers for disease diagnosis, but also show the potential cure for diseases. At proteomics level, e.g., target proteomics can be used to detect the possible disease-related protei, which should be used for clinical staging and typing.In this thesis, I introduced background and research status of the identification of driver genes in chapter one. Chapter two outlines the data sources and methods common used during research. In the chapter three, a multi-objective optimization model based on a genetic algorithm (MOGA) was proposed to solve the so called maximum weight submatrix problem, which can be employed to identify driver pathways promoting cancer proliferation. The maximum weight submatrix problem is defined to find mutated driver pathways based on two specific properties, i.e., high coverage and high exclusivity. An integrative model combining gene expression data and mutation data was proposed to improve the performance of the MOGA algorithm in a biological context. In the chapter four, under the molecular network framework, a computational framework for identifying driver mutations by estimating their effect on mRNA expression networks, named DriverFinder was proposed. These long genes mutated by chance was filtered out to remove the reduntant information.
Keywords/Search Tags:Next-generation sequencing, Omic data, Cancer, Mutated driver genes, Mutated driver pathways
PDF Full Text Request
Related items