Font Size: a A A

Tumor Dna Microarray Data Classification Based On Non-negative Matrix Factorization

Posted on:2010-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2204360275455211Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
DNA microarray technology is a newly developed technology,formed by the interdiscipline of physics,electronics and molecular biology,etc.Microarray technology has been widely applied to the study on biological and medical fields.Among its applications,the microarray technology based cancer diagnosis makes it possible to deeply study the cancer pathological mechanism,including the occurring and diffuseness of cancer.In order to achieve reliable diagnosis and prediction on the type of cancers,many researches are focused on the identification of key genes to different cancers and the classification of cancers.Moreover,it is also a challenge task to extract feature or select genes related to tumor from the gene expression profiles because of its characteristics such as the high dimensionality,the small sample set and noises and redundancy in gene expression profiles.Therefore,the molecular diagnosis of tumor has been broadly and deeply investigated and a large number of papers related to this problem are published.However,the accurate classification of tumor by selecting the tumor-related genes from thousands of genes is a difficulty task due to the large number of redundant genes,and usually it is impossible to apply an exhaustive algorithm to search informative gene subset in such large gene space.Thus,choosing an appropriate classification method and classifier is very important.In this thesis,we propose a new method for tumor classification using gene expression data. We introduced the techniques and methods in gene selection and classification process model, described the procedure of the process model.And then we compare the classification accuracy rate of our proposed method with the result of the other methods.The main study works of this thesis are described as follows:(1) Non-negative matrix factorization(NMF) algorithm was widely used in decomposing images,manipulating image data and so on.Yet previous works have not used the nonnegative information of gene expression data for classification.In this thesis,we first extract features by nonnegative matrix factorization(NMF) and sparse NMF(SNMF),then used the feature to classify the samples.We apply the proposed method on three DNA microarray data sets and the results show that the method is efficient and feasible.(2) The selection of key genes in the microarray dataset is regarded as a feature selection problem.So we first select genes with NMF and SNMF,in which we propose new rules in these algorithms.Then extract features of the select gene data by virtue of non-negative matrix factorization NMF and SNME At last,we apply support vector machines(SVM) to classify the tumor samples using the extracted features.To better fit for classification aim,a modified SNMF algorithm is also proposed.The experimental results on three microarray datasets show that the method is efficient and feasible.Finally,the works in this thesis are briefly summarized and reviewed,and further research works are also discussed and proposed.
Keywords/Search Tags:Tumor classification, Feature extraction, Gene selection, Non-negative matrix factorization, Sparse Non-negative matrix factorization, Support vector machine
PDF Full Text Request
Related items