Font Size: a A A

Interactive Features And Adaptive Clustering Algorithm

Posted on:2012-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:W Z WangFull Text:PDF
GTID:2208330335471960Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rapid advances in data collection and storage technology have enabled organizations to accumulate vast amounts of data. With the increasing of the complexity of data, the dimension of features involved becomes higher and higher. And in Data Mining, it also resulted in the curse of dimensionality. In general, there are two methods to avoid and solve this problem. One way to reduce the dimensionality is to use only a subset of the features, called Feature Selection. Another way called dimensionality reduction. It reduces the dimensionality of a data set by creating new features that are a combination of the old features. This way is also called feature extraction. At first, this dissertation briefly discusses the current relevant research situation, basic theory and methods of feature selection and dimensionality reduction. And the major contributions of this dissertation are as follows:(1) Discussing the importance of feature interaction in Data Mining. Firstly, we define the concept of feature interaction. Then, we show that the concept of feature interaction has a crucial role across different kinds of problem in data mining, such as learning target concepts, coping with small disjuncts, detection of Simpson's paradox and influence of design a rule induction algorithm. A better understanding of feature interaction can lead to a better understanding of the relationship between these kinds of problems. These also draw attention to the fact that most rule induction algorithms are based on a greedy search which does not cope well with the problem of feature interactions.(2) Designing a method that can reduce high-dimensional data while finding feature interactions indirectly. We take up the challenge to design a special data structure for feature quality evaluation, and to employ a feature ranking mechanism to efficiently handle feature interaction in subset selection. We conduct experiments to evaluate our approach by comparing with some representative methods. Extensive experimental results on real-world datasets showed the effectiveness of this approach.(3) Combining linear discriminant analysis (LDA) and bisecting K-means clustering (BKM). An adaptively clustering method is proposed for high dimensional data. The method uses LDA to transform the high dimensional dataset into low dimensional one, applies BKM on the low dimensional dataset, and constructs the clusters in the original high dimensional dataset. The method is adaptively executed to generate the best result. Extensive experimental results on real-world datasets show the effectiveness of this approach.
Keywords/Search Tags:feature selection, dimensionality reduction, feature interaction, consistency measure, linear discriminant analysis, bisecting K-means
PDF Full Text Request
Related items