Font Size: a A A

Study On Clustering Algorithms

Posted on:2005-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:L H LuanFull Text:PDF
GTID:2120360125461653Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important research problem in the domain of data mining. The goal of clustering is to partition data set into such clusters that intra-cluster data are similar and inter-cluster data are dissimilar without any prior knowledge, which is very different from data classification. So clustering is also known as "unsupervised classification". Cluster analysis can be used not only as a separate technique to discover the information about data distribution, but also as the preprocessing of other data mining operations, therefore it is very meaningful to research how to boost the performance of clustering algorithms.Many clustering algorithms are presented, which include distance-based clustering algorithms and density-based clustering algorithms. This paper mainly researches on distance-based clustering algorithms whose representative is k-means algorithm and density-based clustering algorithms whose representative is DBSCAN algorithm, discusses spatial indexes which can be used to boost clustering performance, and proposes a quadtree-based fast clustering algorithm QTCDBSCAN which improves the method of expanding clusters to reduce the number of region queries, uses a quadtree to improve the speed of region queries and reduces the time of constructing a quadtree.In order to test the performance of clustering algorithms, we design and realize a clustering experimental system (CES), which carries out data collection, clustering and two-dimensional data visualization. Experimental results show that DBSCAN algorithm can discover any-shape clusters and k-means algorithm is fast but easy to get local optimization. By comparing algorithms using quadtree with those without quadtree, experiments state the important effect of using spatial indexes to boost the performance of clustering and the efficiency of QTCDBSCAN algorithm.
Keywords/Search Tags:Clustering, k-means, DBSCAN, spatial index
PDF Full Text Request
Related items