Font Size: a A A

Feature Extraction Of Cancer Gene Expression Data Based On Non-negative Matrix Factorization

Posted on:2012-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:R P WangFull Text:PDF
GTID:2214330338470966Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As a new technology, DNA microarray technology is becoming an effective way to analyze cancer gene expression data. More and more cancer experts are using this technology to analyze the difference of gene expression between normal tissues and cancer tissues. However, large dimension and small sample size are two notable features of gene expression data. Each sample records expression level of all measurable genes in tissue cells. But most of the genes have nothing to do with sample categories and thus have no information of sample categories. These noise genes can reduce the accuracy of gene classification. As a result, it is necessary to extract structure and functionality about gene from experimental data in order to find genes that are functionally related to each other and delete noise genes. How to effectively extract gene characteristics and lower the dimension of gene expression data is key to the research of cancer gene classification. Non-negative matrix factorization theory is used by this thesis to extract features of gene expression data and classification by classifier is applied to validate the feasibility and effectiveness of this algorithm. Detail research contents and experimental results are as follows:1. A feature extraction algorithm based on non-negative matrix factorization is proposed. The basic idea of non-negative matrix factorization is reflecting the potential structure of data by decomposing one non-negative matrix into the multiplication of two.Firstly, the gene expression data is filtered. Secondly, non-negative matrix is constructed and decomposed in order to get small dimension vectors that can fully characterize the sample. Lastly, support vector machine is used to categorize the vectors. Experimental results validate the feasibility and effectiveness of this algorithm.2. A feature extraction algorithm based on local non-negative matrix factorization is proposed. Also based on non-negative matrix factorization, this algorithm works by restricting the iteration condition in three aspects.Firstly, the gene expression data is filtered. Secondly, local non-negative matrix is constructed and decomposed in order to get small dimension vectors that can fully characterize the sample. Lastly, support vector machine is used to categorize the vectors. Experimental results validate the feasibility and effectiveness of this algorithm. 3. A feature extraction algorithm based on sparse non-negative matrix factorization is proposed. It is a kind of non-negative matrix factorization algorithm that adds sparseness constraints to coefficient matrix. Compared with traditional non-negative matrix factorizatio method, it can find stable and intuitionistic local features better. At the same time, it can freely control sparseness of matrix after factorization. The features of this algorithm are fast convergence, low correlation of base matrix and coefficient matrix and so on.Firstly, the gene expression data is filtered. Secondly, sparse non-negative matrix is constructed and decomposed in order to get small dimension vectors that can fully characterize the sample. Lastly, support vector machine is used to categorize the vectors. Experimental results validate the feasibility and effectiveness of this algorithm.
Keywords/Search Tags:Extract Features, Gene Expression Data, Non-Negative Matrix Factorization, Local Non-Negative Matrix Factorization, Sparse Non-Negative Matrix Factorization
PDF Full Text Request
Related items