Font Size: a A A

Research On Integrated Non-negative Matrix Factorization Method For Network Pattern Mining Of Omics Data

Posted on:2022-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:X N ZhangFull Text:PDF
GTID:2510306323484734Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,bio-omics data appears in the form of a blowout.The in-depth study of cancer omics data can dig out important information in the development of cancer and provide theoretical basis for the diagnosis and treatment of cancer.The cancer omics data usually have the high-dimensional and small samples characteristics.The integrated non-negative matrix factorization(NMF)method can perform joint analysis on these data to discover potential associations between multiple types of data.The existing integration methods still have some problems,such as insufficient manifold learning ability,poor model homogeneity effect and insufficient data heterogeneous fusion.Therefore,this paper improves the integrated NMF method and successfully applies it to multi-cancer genomics data.The specific research content is as follows:(1)In order to solve the problem of insufficient manifold learning ability of existing models,an integrated hypergraph regularized non-negative matrix factorization(iHNMF)method is proposed.The utility of hypergraph regularization item can better preserve the spatial structure of multi-dimensional omics data,which is beneficial to preserve the global characteristics of the data.In addition,extending a single model to an integrated model can better analyse multi-dimensional omics data and help discover potential associations between data.In order to verify the effectiveness of the iHNMF method,we perform sample clustering and gene co-expression network analysis experiments on the multi-cancer genomics dataset.Then,a new metric for evaluating the gene importance is proposed to screen the genes in the network.Finally,the selected genes are verified according to the existing literature and related biological explanations are given.(2)In order to further improve the homogeneity effect of the integrated model,an integrated robust structured non-negative matrix factorization(iRSNMF)method is proposed.The structured items can make the basic clustering structure in the data more consistent,thereby preserving the homogeneity of the multi-dimensional omics data.In addition,this method uses the L2,1norm to reduce the adverse effects of noise and outliers in the original data.In order to verify the effectiveness of the iRSNMF method,we perform the clustering and network analysis experiments.The important genes and pathways discovered are analyzed and verified.(3)In order to fully integrate the heterogeneity between omics data,an integrated weighted non-negative matrix factorization(iWNMF)method is proposed.This method can consider the homogeneous information and heterogeneous information between multi-dimensional omics data simultaneously.In order to verify the effectiveness of the iWNMF method,we perform the multi-cancer sample clustering experiments.In addition,iWNMF is used to construct the gene co-expression networks and the functional grouping networks,and the important genes in the network are analyzed and verified.Finally,the biological process(BP)and Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway involved in genes are enriched and analyzed.The experimental results show that the clustering performance of the method proposed in this paper is better than other comparison methods.In addition,using the above methods to construct gene co-expression networks can mine the key genes and pathways involved in cancer development.
Keywords/Search Tags:Integrated non-negative matrix factorization method, Gene co-expression network, Hypergraph regularization, Structured term, Fusion weighting
PDF Full Text Request
Related items