Font Size: a A A

A Design Of Clustering Mining Algorithm Distinguishing The Multi-dimensional Based On Grid Density

Posted on:2015-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2297330467482638Subject:Statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important part of data mining algorithms. It is an analysis exercise in data mining. The clustering algorithm is the core of the overall cluster analysis in that it determines the quality of the results. Currently, guaranteeing the stability and effectiveness of the algorithm, how to further improve clustering efficiency and how to reduce the cost and burden of the users have become very meaningful research topics.As traditional clustering algorithms have a high requirements in terms of computer hardware resources and requires a long time to complete the massive data clustering operation, this paper proposes a new clustering algorithm based on grid and density. While grid-based clustering saves time cost and high efficiency, it’s not very good at clustering quality. On the other hand, density clustering algorithms can use any clusters with different shapes, but it results in high cost in time due to its complexity in dealing with high-dimensional data space. Due to the complementary relationship between the two algorithms, a combination of strategies in grid density to distinguish the sample space can greatly improve the efficiency of clustering. This paper presents the following method:First, create grids, which is the initial data space meshing. Next, divide the sample space based on the threshold obtained by the grid density. The data in the grid cell is divided into the high and the low density regions. High density areas in the grid are arranged in accordance to the density to find the densest grid. The most current low density mesh region surrounding the densest grid is used to find the highest density cluster. The point of the highest density area is removed and the remaining high-density grids are sorted. The above steps are repeated until the final result creates a distinction in the space. Finally, calculate the center of gravity of each sub-class of cluster. The clusters near the center of gravity of space will be merged to form a new cluster center, followed by the combining the space until the cluster is equal to a given number of classes. This forms the final clustering results. Firstly, this paper from a theoretical description of the algorithm was proved reasonable and scientific in the design of the algorithm. Finally, based on Matlab to generate several groups of random data for an empirical analysis, this algorithm was compared with the classic K-means algorithm in set square deviation and time. The results proved this algorithm had a significant improvement in the computation time and verified the efficient of this algorithm.
Keywords/Search Tags:Clustering Algorithm, Grid, Density, Distinguish
PDF Full Text Request
Related items