| Cancer is a complex disease that occurs under the combination of many factors and can seriously endanger human health.Due to the complexity of cancer,the key modules identified from a single type of data usually cannot fully capture the pathogenic mechanism of cancer.With the development of sequencing technology,there is an influx of large-scale multidimensional genomic data,which provides researchers with new perspectives on the exploration of cancer.The integrated analysis of multidimensional genomic data of cancer,followed by the identification of key modules containing different levels of biomolecular information can help explore the underlying mechanism of cancer development and ultimately provide theoretical basis for early screening and clinical treatment of cancer.Based on the theory of joint non-negative matrix factorization(JNMF)algorithm,this paper proposes the sparse orthogonal regularized joint non-negative matrix factorization(SOJNMF)algorithm and network-regularized sparse orthogonal-regularized joint non-negative matrix factorization(NSOJNMF)algorithm which have been applied to multidimensional genomics data of cancer to effectively identify the key modules of cancer.The analysis results show that the SOJNMF algorithm and NSOJNMF algorithm can effectively solve the problems of the JNMF algorithm and some similar algorithms in practical applications.The work of this paper is as follow:(1)To ensure the sparsity of the decomposed matrices and reduce the redundancy among modules,based on the framework of JNMF algorithm,this paper applies the L1 parametrization to the coefficient matrices and introduces orthogonal regularization to constrain the coefficient matrices,while proposing the SOJNMF algorithm,and applying it to the multidimensional genomic data of liver cancer to identify 238 key mRNA-miRNAmethylation modules.The performance comparison results show that the SOJNMF algorithm can significantly reduce the overlap rate of features among the 238 key modules identified,and the AUC value is higher than that of the JNMF algorithm and the SJNMF algorithm.The results of functional enrichment analysis show that most of these 238 modules are biologically meaningful.In addition,the most enriched GO biological process and KEGG pathway in the enrichment analysis results of these modules are closely related to the occurrence and development of liver cancer.The results of permutation tests indicate that features of different data types within 59.6% of the 238 modules were statistically significantly correlated.These results show that the SOJNMF algorithm can effectively identify the key mRNA-miRNA-methylation modules related to liver cancer,which is helpful to study the regulation mechanism of biological pathways related to liver cancer at the level of multi-dimensional genomics,and has important reference significance for clinical research and early diagnosis of liver cancer.(2)To identify the relationship between miRNAs,mRNAs,and lncRNAs,this paper proposes the NSOJNMF algorithm to recognize cancer-associated ceRNA co-modules to help understand the expression patterns and underlying molecular mechanisms of RNAs in cancer.The method integrates the interaction relationship between RNA data in a network regularized manner,while effectively preventing multicollinearity through sparse constraints and orthogonal regularization constraints,and produces a good modular sparse solution.The NSOJNMF algorithm has been applied to the liver cancer dataset and the colon cancer dataset,and 200 key modules for liver cancer and 210 key modules for colon cancer have been identified,respectively.The enrichment analysis of these modules demonstrates that more than 90% of the modules are closely related to the occurrence and development of cancer.In addition,these ceRNA networks constructed by ceRNA co-modules not only accurately mine the known relatedness of the three RNA molecules,but also further discover their potential biological associations,which may help to explore the competition relationship between multiple RNAs and molecular mechanisms of tumor development.These results indicate that the NSOJNMF algorithm can effectively identify ceRNA modules,and can provide theoretical support and basis for future verification that the competitive relationship between multiple RNAs in genomics data affects tumor development. |