Font Size: a A A

Research On Driver Genes Discovery Algorithm Based On Cancer Omics Data And Network Analysis

Posted on:2021-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:P J WeiFull Text:PDF
GTID:1364330614961464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cancer is mainly driven by gene abnormalities.It is widely accepted that only a few mutated genes,which have the selective growth advantage,can facilitate to cancer progression,known as driver genes.The most mutated genes with no effect on cancer progression are called passenger genes.Identifying driver genes from numerous passenger genes is a hot project.To this end,different methods have been proposed.Especially,given the interactions among genes and the ability to study cancer characteristics being more systematically based on network,a lot of methods based on the network have been developed to mine cancer driver genes these years.However,there are many important features affecting the discovery of driver genes,such as the impact of gene length to mutation,the impact of prior knowledge and topological features of the network.In addition to discovering cancer driver genes on one type of cancer,some studies have demonstrated that different cancers may share common characteristics and pathogenesis.Hence,this dissertation makes a systematic research aim at these problems.The main contributions of this dissertation are listed as follows:(1)This dissertation proposes a cancer driver gene identification method LNDriver based on the correction of gene length to mutation probability.This method takes the impact of gene length to mutation probability into consideration.As for the somatic mutation,the generalized additive model is used to assign different genes with different probabilities according to the gene length.And the false positive genes due to the long length may be filtered out.Then based on the protein-protein interaction network,a bipartite graph between mutation and expression data is constructed.Finally,the greedy algorithm is used to detect driver genes.The results on different types of cancer data show that the performance of LNDriver is better than some traditional methods and can decrease the false positive genes with long length.(2)On the basis of LNDriver,an improved method Driver Finder is proposed according to the incompleteness of known protein-protein network and the difference of distribution between tumor and normal samples.In addition to the effect of gene length,the gene co-expression network is firstly constructed and then combined with a known protein-protein network.Then the specific network is constructed which also avoids the loss of genes due to incomplete network.In addition,the outlier genes are determined according to the distribution of expression between tumor and normal samples.The results of Driver Finder on different cancers show that it can identify cancer driver genes efficiently.(3)This dissertation proposes a random walk-based method with transition preference to identify cancer driver genes.In the traditional random walk method,the random walker selects the next arriving node uniformly from its' neighbors.However,in real gene network,the random walker is more likely to have tendentiousness to select the neighbors with a greater degree rather than uniformly.In addition,the topological features of known driver genes from different cancer respectively are used to measure the random jumping probabilities.The analysis results show that the performance of Driver?IRW is significantly better.(4)This dissertation proposes a joint non-negative matrix factorization method,DriverMul JNMF,based on multiple networks to identify common genes among different cancers.As different cancers may harbor common characteristics and pathogenesis,Driver-Mul JNMF method constructs a multiple differential co-expression network according to several familiar gynecological cancers which harbor relatively high disease similarity.Simultaneously,the known protein-protein interaction network is incorporated as constraint.This method can detect common modules of multiple networks simultaneously.And the analysis of the genes in these modules indicated that they are significantly enriched in cancer-related Hallmark terms.In addition,they are also enriched in some important pathways significantly.And some genes have good prognostic abilities in the survival analysis.
Keywords/Search Tags:Bioinformatics, Cancer, Driver Gene, Passenger Gene, Interaction Network
PDF Full Text Request
Related items