Font Size: a A A

Research On Sparse Matrix Decomposition Method For Difference Feature Recognition

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2270330485483949Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the progress and development of high-throughput sequencing technologies, biological information manifests an explosive growth. It has brought the huge challenges for people to find useful information from vast amounts of genomic data and genetic variation in the data. The traditional data analysis method cannot satisfies the needs of the reality research, while the sparse matrix factorization theory as a new generation of data mining technology is able to deal with large-scale biological information data. Further it can identify the feature genes contains critical information from gene expression data, thus providing an effective means and methods for the life sciences to better understand life.In this paper, the author has a comprehensive analysis of domestic and foreign scholars on sparse matrix factorization theory and characteristic gene recognition algorithms, and found that there was a part of the lack of academic research. Therefore, based on the results of previous studies, the author focus on the characteristic gene selection through in-depth study of sparse matrix factorization, then expands the nonnegative matrix factorization research and improves the algorithms to make a better performance. This paper proposes three algorithms for identifying characteristic genes: An NMF-L2,1-norm Constraint Method for Characteristic Gene Selection, Characteristic Gene Selection based on Robust Graph Regularized Non-negative Matrix Factorization and Identification genes of colorectal cancer with integrated data via Block-sparse NMFL2,1. The NMF algorithm based on L2,1 is proposed by the the characteristic of the noisy and outlier data, the L2,1 norm constrain imposed on the error function and regularized function can diminish the impact of outliers and generate a sparse results, respectively. The robust graph regularized Non-negative Matrix Factorization, which mainly contains two aspects: Firstly, enforcing ??L-norm minimization on error function which is robust to outliers and noises in data points. Secondly, it considers that the samples lie in low-dimensional manifold which embeds in a high-dimensional ambient space, and reveals the data geometric structure embedded in the original data. The Block-sparse NMFL2,1 is proposed by the characteristic of TCGA dataset, enforcing different sparse constrain on different data can Make the result more easily to understood and explained.In order to verify the performance of the three proposed algorithms, in this paper, experiments were carried out on gene expression data sets and integriated datasets to compare with existing methods. Experimental results demonstrate that the proposed algorithms are effective and feasible.One of the innovations of this paper is that the L2,1 norm constrain can generate the sparse and robust results, so we proposed An NMFL2,1norm Constraint Method and successfully applied to characteristic gene selection. The second innovation is that based on the L2,1 norm and manifold learning proposed a Robust Graph Regularized Non-negative Matrix Factorization algorithm and applied to the characteristic gene selection. The third innovation proposed a Block-sparse NMFL2,1 algorithm and applied to the integrated datasets.
Keywords/Search Tags:Sparse matrix factorization, Characteristic gene, L2,1 norm, Block-sparse constraint, manifold learning
PDF Full Text Request
Related items