| Cancer is one of the main diseases posing serious threats to human. The early diagnosis of cancer is key to improve the survival rate of patients. With the rapid development of DNA microarray technology, vast amounts of cancer gene expression data have been collected. On the basis of molecular biology, making use of the huge gene expression data for early cancer diagnosis has become a hot topic in the post genome era. However, gene expression data always has the characters of small sample size, high dimension and nonlinear.To solve the problems above-mentioned, this paper introduced a classification method based on SVM(support vector machine), which can be used to realize the cancer diagnosis. SVM is a new machine learning method based on SLT(statistical Learning theory), using ERM(Empirical Risk Minimization) principle instead of SRM(Structural Risk Minimization) rule. Kernel function is successfully applied to convert nonlinear problem to a linear problem, so it has good generalization ability. As a result, SVM has many unique advantages especially in solving the pattern recognition problems which are sample limited, nonlinear and high dimensional.SVM avoids the over-fitting and under-fitting problems effectively. However, problems like small sample size and high dimension still have the influence on classification accuracy. Therefore, the dimensional reduction has become an important step in cancer genetic data classification. In this paper, some dimensional reduction methods are applied to get a lower-dimensional data, and then SVM is used for classification. Higher cancer diagnosis accuracy is achieved by comparing various methods of dimensional reduction and setting the appropriate parameters of SVM. This paper uses SPCA,GDA, Laplacian Eigenmaps, etc.The main emphasis of this paper is optimizing gene data by dimensional reduction methods. Two public databases “Prostate Tumor†and “Leukemia†are chosen to do the experiment. The results and analysis of the experiment show: GDA is the best dimensional reduction method for Prostate Tumor dataset, and MDS is the best dimensional reduction method for Leukemia dataset. As a result, cancer gene data can be optimized effectively by finding the optimal combination of dimensional reduction method and its target dimension, the classification of SVM can be improved by the adjustment of SVM parameters. |