Font Size: a A A

Feature Gene Selection Of Cancer Based On Statistical Method

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2284330488466916Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Cancer is a kind of serious fatal disease which plagues modern medicine. Cancer is now generally referred to various malignant tumors. Clinically, the diagnosis of the tumor at present is mainly through morphology method, but this method is not accurate. The occurrence of cancer often involves changes in genes in vivo, and therefore studying the pathogenesis of cancer from the genetic level is more scientific. DNA microarray technology can get a lot of biological gene expression data in a short time, which makes it possible to analysis the cancer from gene level. Meanwhile, the method for the early diagnosis and follow-targeted therapies for cancer patients are very importantly significant.But gene expression profile data often has the characteristics of high dimension, small sample and large noise, which greatly increases the difficulty of data analysis. Usually the number of genes that cause cancer is relatively small, and a large number of unrelated genes increase the difficulty of inclusion in the analysis. Therefore, a certain standard is selected in advance to eliminate irrelevant genes, and thus the data dimensionality is reduced. Selecting the optimal feature genes to achieve higher classification accuracy is becoming the basic idea of cancer research by using DNA microarray technology.In this paper, colon cancer gene expression data is used as an example. By combining Chernoff distance and Bhattaacharyya distance to filter out irrelevant genes,136 representative candidate feature genes were found. Then, by using Lasso for further dimensionality reduction of data analysis,21 key feature genes were finally selected. By using support vector machine approach to test the classification results of the selected feature genes, the 87% of classification accuracy was obtained. Some of the feature genes are associated with colon cancer confirmed by biological experiments.
Keywords/Search Tags:Gene expression data, Chernoff distance, Bhattaacharyya distance, Lasso, Support vector machine
PDF Full Text Request
Related items