Font Size: a A A

Support Vector Data Description-based Feature Selection Method And Its Application

Posted on:2016-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J CaoFull Text:PDF
GTID:2284330464453295Subject:Computer technology
Abstract/Summary:PDF Full Text Request
For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. How to fast select the useful low-dimensional data from high-dimensional gene expression data has attracted a lot of attention. Here we focus on the feature selection method based on support vector data description(SVDD) for gene expression data, which can remove the irrelevant genes and choose the informative ones. In doing so, the classification performance can be improved. The contributions of this thesis are concluded as follows.This thesis proposes a fast feature selection method based on SVDD. The feature selection method based on SVDD has been proposed. However, this method is time consuming due to its high computational complexity. To remedy it, a novel SVDD-based feature selection method is proposed. In this proposed method, the irrelevant feature elimination depends on the energy of directions in the center of hypersphere. In addition, a scheme of recursive feature elimination(RFE) is introduced to iteratively remove irrelevant features. Experimental results on the Leukemia and Colon Tumor datasets show that the novel method has fast speed for feature selection. In addition, the selected features are efficient for classification tasks.This thesis presents a fast gene selection method based on multiple SVDD models to deal with multi-class classification problems. The existing methods based on SVDD cannot address multi-class problems since these methods only use the target class and ignore the other categories of data. In fact, the multi-class data is more common in the real world. Thus, a novel fast feature selection method based on multiple SVDD is developed. The proposed method independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of the new method are validated by experiments on five publicly available microarray datasets, including three multi-class datasets and two two-class datasets. Our proposed method is faster and gets more discrimination features than other methods.
Keywords/Search Tags:Support vector data description(SVDD), Feature selection, Gene selection, Multi-class classification, Recursive Feature Elimination
PDF Full Text Request
Related items