A Study Of Cancer Gene Data Classification Based On SVM Algorithm

Posted on:2016-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Huang

Full Text:PDF

GTID:2284330464450566

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Cancer is one of the main diseases posing serious threats to human. The early diagnosis of cancer is key to improve the survival rate of patients. With the rapid development of DNA microarray technology, vast amounts of cancer gene expression data have been collected. On the basis of molecular biology, making use of the huge gene expression data for early cancer diagnosis has become a hot topic in the post genome era. However, gene expression data always has the characters of small sample size, high dimension and nonlinear.To solve the problems above-mentioned, this paper introduced a classification method based on SVM(support vector machine), which can be used to realize the cancer diagnosis. SVM is a new machine learning method based on SLT(statistical Learning theory), using ERM(Empirical Risk Minimization) principle instead of SRM(Structural Risk Minimization) rule. Kernel function is successfully applied to convert nonlinear problem to a linear problem, so it has good generalization ability. As a result, SVM has many unique advantages especially in solving the pattern recognition problems which are sample limited, nonlinear and high dimensional.SVM avoids the over-fitting and under-fitting problems effectively. However, problems like small sample size and high dimension still have the influence on classification accuracy. Therefore, the dimensional reduction has become an important step in cancer genetic data classification. In this paper, some dimensional reduction methods are applied to get a lower-dimensional data, and then SVM is used for classification. Higher cancer diagnosis accuracy is achieved by comparing various methods of dimensional reduction and setting the appropriate parameters of SVM. This paper uses SPCA,GDA, Laplacian Eigenmaps, etc.The main emphasis of this paper is optimizing gene data by dimensional reduction methods. Two public databases â€œProstate Tumorâ€ and â€œLeukemiaâ€ are chosen to do the experiment. The results and analysis of the experiment show: GDA is the best dimensional reduction method for Prostate Tumor dataset, and MDS is the best dimensional reduction method for Leukemia dataset. As a result, cancer gene data can be optimized effectively by finding the optimal combination of dimensional reduction method and its target dimension, the classification of SVM can be improved by the adjustment of SVM parameters.

Keywords/Search Tags:

DNA microarray, Gene expression data, Dimensional reduction, SVM, Data classification

PDF Full Text Request

Related items

1	Cancer Classification Methods Based On Gene Expression Data
2	Cancer Microarray Data Classification Based On Rough Sets Methods
3	Classification Of Gene Expression Data Of Tumor Microarray Based On Intelligent Optimization Algorithm
4	Reserch On The Classification For Tumor Genomics Data
5	Research On Feature Dimension Reduction Algorithm For Tumor Gene Data
6	Applying Of Support Vector Machines In Microarray Gene Expression Data Classification
7	The Research Of Cancer Classification Based On DNA Microarray Data
8	Application Study Of Gene Expression Data On Diagnosis Of Tumor And Prediction Of Gene Function
9	Classification Of Gene Expression Data Based On Improved Salp Swarm Algorithm And Heterogeneous Integrated Learning
10	Statistical analysis of gene expression data in cDNA microarray experiments