| The accurate prediction of disease status is a central challenge in clinical cancer research.Microarray-based gene biomarkers have been identified to predict outcome and outperformed traditional clinical parameters.However,the robustness of the individual gene biomarkers is questioned due to their little reproducibility between different cohorts of patients.Substantial progress in treatment requires advances in methods to identify robust biomarkers.Several methods incorporating pathway information have been proposed to identify pathway markers and build classifiers at the level of functional categories rather than of individual genes.The pathway-based classifiers achieved more reliable performance and provided better functional interpretation of the expression profile that may be associated with therapeutic choices.However,current methods consider the pathways as simple gene sets but ignore the topological information embedded in the pathway networks,which is essential to infer a more robust pathway activity.In this study,we propose a directed random walk(DRW)-based method to mine the topological information and infer the pathway activity.DRW is performed on a merged global pathway network and evaluates the topological importance of each gene by capturing most topological characteristics,such as the position of the genes in the pathway,how many genes interact with the given gene,and the type of interactions.We incorporate the topological information to weight the genes at the step of pathway activity inference.By weighting the genes,the DRW method could amplify the signals of the key genes,whose variations in their expression levels may greatly impact the pathway but weaken the differential signals of the genes,which only appear somewhere downstream or does not affect the given pathway as much.The strategy of weighting genes by their topological importance greatly improves the reproducibility of pathway activities.We applied the DRW method to the classification of six types of cancers and showed that the proposed method yielded a more accurate and robust overall performance compared to several existing gene-based and pathway-based classification methods,both within single datasets and in different independent datasets.The discriminative pathway activities that are frequently selected to build the classifier reveal new,robust risk active pathways for cancers.The resulting risk active pathways are more reliable in guiding therapeutic selection and the development of pathway-specific therapeutic strategies. |