Font Size: a A A

Research On Split-and-merge Based Clustering Algorithm

Posted on:2021-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:X G HuangFull Text:PDF
GTID:2518306113469384Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering analysis is widely used in life.It is an important part of exploratory data analysis.It is widely used in many fields,such as power,medicine,finance,biology,image segmentation,pattern recognition,sociology and so on.With the development of information technology,the amount of data increases exponentially.In order to adapt to the trend that the amount of data becomes larger and the shape of the class in the data set becomes more complex,this paper proposes snake algorithm and Grid-SL algorithm which can find any shape and have low complexity.The first algorithm is snake algorithm.It is a clustering algorithm of splitting and merging.Firstly,the sample space is divided;then,in the process of splitting,a simple path connecting all points is generated;then,according to the edge length distribution on the simple path,the edge length threshold information is obtained,and the data set is divided into multiple small clusters according to the edge length threshold;in the process of fusion,a new formula for calculating the distance between clusters is proposed,which uses the most representative points of clusters As the distance between clusters,the short distance reduces the calculation of the distance between clusters,and then further reduces the calculation of the unnecessary distance between clusters through the grid structure,and finally obtains the clustering results after fusion.The algorithm can find the non spherical class,and can also recognize the noise through the threshold of grid density in the process of clustering,and the complexity of the algorithm is (9)).The effectiveness of snake algorithm is verified on the generated data set.The second algorithm we proposed is Grid-SL algorithm,which uses the technology of snake algorithm to improve the high time complexity of SL(singlelink)algorithm.Compared with SL algorithm,Grid-SL algorithm does not introduce new parameters.Through experiments,it is found that grid SL algorithm improves the running time of the algorithm significantly compared with SL algorithm through grid structure,and Grid-SL algorithm has the fastest running speed among various improved algorithms of SL algorithm.The output of grid SL clustering is consistent with that of SL algorithm,and other improved algorithms of SL algorithm can not guarantee the consistency of clustering results.
Keywords/Search Tags:Clustering, Self-adaption, Big data, Arbitrary shape, Grid
PDF Full Text Request
Related items