Dimensionality Reduction Based On Feature Selection

Posted on:2016-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:X P Wen

Full Text:PDF

GTID:2347330479954424

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Feature selection is the most important method in dimensionality reduction, which is relative to the feature generation(or extraction). Combining both of the two methods covers the most often used techniques of dimensionality reduction. Furthermore, dimensionality reduction is a key problem in many theoretic and applied fields such as statistics, data mining, and pattern recognition. Feature selection is beneficial to decreasing the time complexity of data processing and the space complexity of data storing, what's more, improving the accuracy, robustness and generalization ability of the learning model. In this thesis, feature selection is classified and described due to the different mechanism of supervised and unsupervised learning. Several efficient algorithms are design based on the mutual information which is one the most significant concept in the information theory. The main topics of are presented as follows:(1) In the supervised case, we use the mutual information as a tool to design the Parzen Window feature selection(PWFS) and maximal relevance and minimal redundancy feature selection(MRMR) algorithms.(2) In the unsupervised case, we design a novel feature selection algorithm by clustering the features using the neighborhood mutual information as the similarity measure. Moreover, this algorithm can be directly applied for mixed numerical(continuous) and categorical data set without discretization or quantization.(3) Applying the neighborhood mutual information to PWFS and MRMR, we can obtain new algorithms applicable to mixed data directly in a supervised way.(4) Algorithms are tested and compared on the datasets downloaded from University of California Irvine(UCI) Machine Learning Repository webpage. And we use these feature selection algorithms to deal with real world dataset related to the economic strength of China's 31 areas coming from China Statistical Abstract 2013.

Keywords/Search Tags:

feature selection, mutual information, maximal relevance and minimal redundancy, supervised learning, unsupervised learning, clustering

PDF Full Text Request

Related items

1	The Research On Feature Selection Method Under Unsupervised Learning
2	Research On Unsupervised Feature Selection Based On BDE-MICI
3	An Enhanced K-means Algorithm Based On Shrinkage Estimation
4	Research On Tendentious Label And Streaming Data Feature Selection Algorithm
5	Design And Implementation Of Student Automatic Grouping System Based On Feature Clustering
6	Semi-supervised Clustering Algorithm Based On Single Linkage Clustering
7	Improved Spectral Clustering Algorithm And Its Application In Risk-model
8	A Study On Hierarchical Clustering Of Micro-learning Units Based On Topic Feature Centers
9	Research On PCA And CFS Feature Dimensionality Reduction Algorithm Based On MIC
10	Multi-label Feature Selection Based On Gravitational Field Model