| The accelerated accumulation of mutations in somatic genes leads to infinite cell proliferation that leads to cancer.A key step in cancer research is to identify the driver mutations and driver genes that lead to the transformation of tumors from a normal state to a malignant state.In addition,cancer driver genes tend to act in only a few biological pathways.Therefore,researchers have shifted their focus from identifying driver genes to identifying driver pathways.Identifying driver pathways helps to reveal the natural laws of disease initiation and development,and has important implications for precise treatment of cancer patients,new drug targets and diagnostic tests.Studies have shown that multiple pathways act synergistically in the same life activity and carcinogenesis.However,the computational methods for identifying cooperative driver pathways have not been well studied.The existing methods have some problems,such as incomplete data,lack of evaluation index of pathway collaboration,difficulty in solving and lack of interpretability of results.In view of these problems,this paper integrates genome data,pathway data and association data between genes and pathways,and researches on cooperative driver pathway identification.The main work is as follows:(1)To solve the problem of incomplete pathway data and lack of pathway collaboration evaluation indicators,we proposed a method which integrates community detection and link prediction to discover cooperative driver pathway.In this method,a weight function is designed to quantify the influence of genes on the target cancer,and the candidate driver genes that have greater influence on the target cancer are screened out to participate in gene module recognition,thus reducing the size of the subsequent data involved in calculation.CDPLP then combines somatic mutation data with gene expression data to construct a gene association network(with expression similarity and mutational exclusivity)on which community discovery algorithms are used to identify gene modules associated with the target cancer.Next,CDPLP constructs a heterogeneous information network containing three nodes(gene,mi RNA and pathway),and predicts the interaction between pathways using the structural information of the network.Finally,CDPLP designs a new quantitative function to quantify the cooperation between the two pathways,and identifies the top 10 pairs of pathways with the highest synergy score as the cooperative driver pathways of the target cancer.Experiment results on four datasets show that CDPLP is effective in identifying cooperative driver pathways associated with target cancers.(2)In order to solve the problems existing in current cancer cooperative driver pathway identification methods,such as large computation and lack of interpretation,a novel cancer cooperative driver pathway identification method Multi Co DP based on meta-path and multi-view clustering is proposed.This method constructs a heterogeneous information network consisting of gene,pathway and patient nodes and their relationships,and selected meta-paths with both starting and ending with pathway to describe the relationships between two pathways.Multi Co DP chooses four meta path to describe four relations between two pathways,pathway-gene-pathway for gene overlap,pathway-gene-gene-pathway for gene interactions,pathway-gene-patient-gene-pathway for gene mutation co-occurrence and pathway-gene-patient-gene-pathway for gene co-expression,respectively.And four matrices describing the similarity of pathways are obtained by calculating the similarity of node of meta path.Then,Multi Co DP uses a multi-view clustering approach to cluster the cooperative driver pathways closely related to the target cancer.The experiment results show that the cooperative driver pathways identified by this method can not only promote the target cancer,but also have strong synergistic effect among the pathways. |