| The continuous generation of cancer sequencing data provides a good data support for cancer research.Among the many data mining methods,the network pattern mining method as an effective data mining method,which can construct connections between bio-molecules macroscopically and provide a wealth of information.Gene expression data reflect the abundance of gene transcript mRNA measured in cells.These data can be utilized to analyze genes which have changed expression,what are the correlations between genes,and the activities of genes affected under different conditions.The utilization of gene expression data to construct co-expression networks is a common method in molecular biological networks.The emergence of cancer multi-omics data makes it impossible for a single data or cancer network analysis model to meet the processing needs of these data.Therefore,multi-cancer network joint analysis based on integrated data and the network analysis based on the integrated model of multi-types data have become new hotspots and trends.Based on multi-cancer integrated data of The Cancer Genome Atlas(TCGA),this study uses the gene co-expression network for mining multi-cancer abnormally expressed gene of interaction modules and pathways.The study is mainly divided into the following three parts:(1)Based on the problem of noise interference,a low-rank method is introduced for network denoising.Before building the network based on the multi-cancer integrated data,the low-rank method was introduced to denoise the cancer integrated data to obtain more reliable and internally linked low-rank cancer data without destroying the data integrity.The Pearson Correlation Coefficient is measurement of genes and finally cancer information is extracted from the network.(2)Based on the problem of measurement for nodes,by considering the various relationships and the local and global characteristics for nodes,a novel network model(PMN)is proposed.This model introduces a network construction strategy that combines linear(Pearson Correlation Coefficient)and nonlinear(Mutual Information)measurements between genes,and draws on the degree and betweenness of nodes to exploit the local and global characteristics of nodes.The model uses the gene expression integrated data of three kinds of cancers from TCGA to build the network,which eliminates the single weak relationship between gene nodes and enriches the information contained in the networks.(3)Based on the fusion problem of integrated data,an integrated graph regularized non-negative matrix factorization model(iGMFNA)is proposed.For different types of data(Gene Expression,Methylation,Copy Number Variation)of the same cancer,the matrix factorization method is used to reconstruct,and each type of data is fully utilized,so that theintegrated network covers the specific information of each type of data.It is helpful for a more systematic analysis of specific cancers and the mining of disease-causing modules.Experiments have shown that these models are superior to similar methods and can find more proven cancer-related genes and modules. |