Font Size: a A A

Research On Manifold Regularization Non-negative Matrix Factorization And Its Application In Omics Data

Posted on:2022-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhuFull Text:PDF
GTID:2510306323484744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,a large amount of bio-omics data have been produced,and these data often contain important information in the development of biology.The rapid development of single-cell RNA sequencing(Sc RNA-seq)technology allows biologists to research sequence gene expression data at the molecular level.The emergence of Sc RNA-seq data provides the possibility for studying the heterogeneity of omics data.It is of great significance for the diagnosis,treatment,prevention and exploration of the process of cell differentiation.The non-negative matrix factorization(NMF)method is an effective method for large-scale data processing,which can be used to effectively reduce the dimensionality of high-dimensional and small-sample biological data.However,the traditional non-negative matrix factorization algorithm does not consider the inherent manifold structure of the data and the influence of data noise,which affects the performance of the algorithm.Therefore,based on the Sc RNA-seq data of the bio-omics data,this paper improves the existing non-negative matrix factorization methods.The specific research content includes the following three parts:(1)Aiming at the characteristics of fuzzy edges and inherent manifold structures of different cell populations in bio-omics data,an adaptive total variation hypergraph regularized non-negative matrix factorization method(ATV-HNMF)is proposed.First,the adaptive total variation method is introduced into the non-negative matrix factorization model,and the edge data and smooth data are respectively enhanced or denoised to preserve sample characteristics and reduce noise interference.Then,the hypergraph regularized term is used to encode the high-order geometric relationships between multiple sample points,and the manifold structure in the data is deeply explored.Finally,the method is applied to cell subtype discrimination and gene marker selection,so as to provide more effective help for cell research at the molecular level.(2)Aiming at the problem of uncertain selection of dimensionality reduction of a single clustering algorithm,an integrated total variation graph regularized non-negative matrix factorization algorithm(ANMF-CE)is proposed.The method uses the adaptive total variation graph regularized non-negative matrix factorization model as the basic learner of the ensemble learning framework.The total variation item can adaptively learn the local features of the data matrix,and select corresponding processing schemes for different types of edge data.Then,the graph regularized item can consider the paired geometric information between the samples.Further through the integrated learning framework,the global structure of the data can be effectively captured,and the prediction matrix based on multiple clustering results can be obtained,then the prediction matrix is integrated into the final result through the consensus function.Finally,the framework is applied to Sc RNA-seq data to improve the ability to distinguish cell subtypes,meanwhile selecting more meaningful marker genes.(3)Aiming at the problem that the existing non-negative matrix factorization methods do not make full use of the inherent geometric structure of the data,the relationship between similar samples and dissimilar samples are simultaneously considered.Then,a similar and dissimilar regularized non-negative matrix factorization method(SDCNMF)is proposed.On the one hand,similar to the manifold regularization constraint,the similarity regularization term preserves the pairwise geometric structure between similar samples that are closer in the data space,making the low-dimensional representations of similar samples in the high-dimensional space becoming closer.On the other hand,the dissimilarity constraint is introduced into the objective function.Introducing dissimilar samples far away in the data space into the iterative update of the algorithm,so that can maximize the information utilization rate of the original data matrix and make dissimilar samples in low-dimensional representations are far away from each other.Finally,the method is applied to gene expression data for clustering and selection of marker genes,which is helpful to understand the heterogeneity between cells.The experimental results show that our method has more advantages than existing models.It can obtain better clustering results,and has real biological significance in network construction and marker gene identification.
Keywords/Search Tags:Non-negative matrix factorization, Dimensionality reduction, Manifold learning, Single-cell sequencing data
PDF Full Text Request
Related items