Font Size: a A A

Study Of Clustering Algorithm And Its Validity Problem

Posted on:2013-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:M S LiuFull Text:PDF
GTID:2248330371999840Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an important research field In data mining and clustering validity which is based on clustering theory method is a discrimination index of clustering quality. Clustering validity methods mainly include the statistical hypothesis based on internal or external criteria, effectiveness of clustering level, separate clustering effectiveness, Dunn and class Dunn index., Davies-Bouldin and class DB index as well as Gap statistics, etc. Clustering algorithms are presented familiarly such as hierarchical clustering algorithm, the grid clustering algorithm, clustering algorithm based on density, and clustering algorithm based on the classification, on the other hand Euclidean distance is used to measure the similarity of different samples in these algorithms. Euclidean distance has some manifest disadvantages such as indiscrimination of samples of different attributes and easy interference of correlation between variables, so that it sometimes can not meet the actual demands because of clustering speed and quality as well as clustering validity index of performance. On the other hand, the Mahalanobis distance owns some advantages such as no influence of dimension, namely no relation between the Mahalanobis distance of two samples and the original data measure, the same value about normalized data and the center of the data basing on Mahalanobis distance of two points, and eliminating interference of between variables.Limitation of the hierarchical clustering algorithm and Euclidean distance is discussed in this paper, moreover a new algorithm of attributes similarity measure and new clustering validity function are presented with respect to Mahalanobis distance in light of the characteristics of data geometrical structure and individual attributes, at the same time hierarchical clustering algorithm basing on Euclidean distance is improved. The improved clustering algorithm is valid by experiments.
Keywords/Search Tags:Hierarchical clustering, Mahalanobis distance, Clustering validity, Datamining
PDF Full Text Request
Related items