Interactive Features And Adaptive Clustering Algorithm

Posted on:2012-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Wang

Full Text:PDF

GTID:2208330335471960

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Rapid advances in data collection and storage technology have enabled organizations to accumulate vast amounts of data. With the increasing of the complexity of data, the dimension of features involved becomes higher and higher. And in Data Mining, it also resulted in the curse of dimensionality. In general, there are two methods to avoid and solve this problem. One way to reduce the dimensionality is to use only a subset of the features, called Feature Selection. Another way called dimensionality reduction. It reduces the dimensionality of a data set by creating new features that are a combination of the old features. This way is also called feature extraction. At first, this dissertation briefly discusses the current relevant research situation, basic theory and methods of feature selection and dimensionality reduction. And the major contributions of this dissertation are as follows:(1) Discussing the importance of feature interaction in Data Mining. Firstly, we define the concept of feature interaction. Then, we show that the concept of feature interaction has a crucial role across different kinds of problem in data mining, such as learning target concepts, coping with small disjuncts, detection of Simpson's paradox and influence of design a rule induction algorithm. A better understanding of feature interaction can lead to a better understanding of the relationship between these kinds of problems. These also draw attention to the fact that most rule induction algorithms are based on a greedy search which does not cope well with the problem of feature interactions.(2) Designing a method that can reduce high-dimensional data while finding feature interactions indirectly. We take up the challenge to design a special data structure for feature quality evaluation, and to employ a feature ranking mechanism to efficiently handle feature interaction in subset selection. We conduct experiments to evaluate our approach by comparing with some representative methods. Extensive experimental results on real-world datasets showed the effectiveness of this approach.(3) Combining linear discriminant analysis (LDA) and bisecting K-means clustering (BKM). An adaptively clustering method is proposed for high dimensional data. The method uses LDA to transform the high dimensional dataset into low dimensional one, applies BKM on the low dimensional dataset, and constructs the clusters in the original high dimensional dataset. The method is adaptively executed to generate the best result. Extensive experimental results on real-world datasets show the effectiveness of this approach.

Keywords/Search Tags:

feature selection, dimensionality reduction, feature interaction, consistency measure, linear discriminant analysis, bisecting K-means

PDF Full Text Request

Related items

1	The Study Of Novel Dimensionality Reduction Methods And Application In Intelligent Recognition
2	Research On Robust Distance Measure Based Discriminant Analysis And Applications
3	Linear Dimensionality Reduction Technology For Face And Palmprint Feature Extraction
4	Multi-label Dimensionality Reduction Algorithm Research Based On Multiple Kernel Learning
5	Dimensionality Reduction And Recognition Technology Of Digital Image-algorithm Research Of Face Recognition
6	Research And Application Of Dimensionality Reduction Techniques
7	Well-Generalizable Linear Dimensionality Reduction Algorithms In Classification And Regression Tasks
8	Dimensionality Reduction On LC-MS Dataset
9	The Study Of Some Issues For Unsupervised And Semi-supervised Dimensionality Reduction
10	Research On Dimensionality Reduction Of High-Dimensional Data