| Cancer is a heterogeneous disease driven by the accumulation of genetic and non-genetic changes,and it is a great challenge to distinguish between those driving cancer genes that are positively associated with cancer and some passenger mutations that appear randomly but do not play a driving role in the development of cellular carcinoma.Therefore,the identification of cancer driver genes has played a crucial role in the development of precision oncology and cancer therapy.Due to the wide range of driving factors and complex driving mechanism,it is usually one-sided to judge the driving characteristics of candidate genes only by background mutation rate.Studies have shown that factors affecting gene drive properties are widely distributed across multi-omics and biological information networks,such as gene expression levels,single nucleotide variants(SNV),chromosome copy number variants(CNV),protein-protein interaction partners of cancer driver genes in protein-protein interaction(PPI)networks,Even the level of DNA methylation can have an effect on a gene’s cancer-driving properties.Integrating multi-omics data and biological networks and extracting effective information from them is expected to be a more effective method for predicting cancer driver genes.With the increase of high-throughput sequencing data of tumor samples and the maturation of graph neural network algorithm,it is possible to identify cancer driver genes by combining multi-omics features and biological information networks.In this paper,we first generated comprehensive multi-omics features for each gene from the genome,epigenome and transcriptome levels,combined with PPI network derived features,and then proposed a new semi-supervised deep learning graph neural network framework GGraph SAGE,which combined Graph Attention Networks(GAT)and Graph Sample and aggregate(Graph SAGE)graph neural biological systems.GAT was used to weight nodes in the biological network to represent the difference of interaction intensity,and Graph SAGE was used to improve the computing power and robustness of the model.When applied to seven cancer types,experimental results showed that GGraph SAGE outperformed several state-of-the-art computational methods for driver gene identification.Through the semi-supervised mechanism of the graph neural network,we identified candidate cancer driver genes that are not included in the Gold Standard cancer driver gene database but are very similar in multi-omics to known cancer driver genes.Through the verification of authoritative literature and DNA replication function test(the expression form of cancer cells acquiring infinite proliferation ability),these candidate cancer driver genes have not only been confirmed by authoritative literature to be strongly associated with the occurrence and development of cancer,but also their mutations will significantly affect the DNA replication function of cells.In addition,it broadens our current understanding of cancer drivers at a multi-omics level,identifying drivers specific to tumor types rather than pan-oncogenes.We look forward to GGraph SAGE breaking new ground in precision medicine and even further predicting the drivers of other complex diseases. |