| Few of massive gene variants in tumor playing an important role are called driver mutations.They enable tumor cells to gain selective growth advantages against the clearance of immune cells and drug therapy.Typically,genes with driver mutations tend to be called driver genes,which are crucial for the prevention,diagnosis and treatment of cancer in application,and help us understand the mechanism of cancer in theory.Compared with accurate but inefficient experimental methods,computational means mine more driver genes from massive data.Recently,works based on integration of mutation data and gene interaction networks are popular.However,it is in suspense if it is more effective for finding driver genes when integrating various types of mutation information and networks.We build a two-stage-vote ensemble framework based on somatic mutations and mutual interactions for identifying cancer driver genes.Specifically,we first represent and combine two kinds of mutation information from 33 types of cancer:frequency and functional impact,which are propagated through networks by an improved iterative framework.Then,the first vote is conducted on iteration results and the second vote is performed to get ensemble results of the first poll for the final list.Genes with the highest ranks are identified as drivers.Compared with four similar methods,our model identifies more known driver genes.We also provide potential driver genes for multiple types of cancer and find these unverified genes are related to cancer by analysis of existing literature,which proves our results are reliable.We also conduct a comparative analysis of two kinds of mutation information,five gene interaction networks and four voting strategies,which provide a new and appropriate perspective to promote the discovery of more drivers.In terms of biological view,the algorithm proposed in this paper can accurately identifies known driver genes,and provide worthy potential driver gene sets for experimental methods,which will help much more real cancer genes to be found.As far as the computational view is concerned,the two-stage vote in this article proves it is more helpful when combining different types of information for the discovery of results;on the other hand,the process of propagating scores through an iterative framework actually is worth considering for other directions,such as semantic analysis and the discovery of social nodes. |