Font Size: a A A

Research On Clustering Algorithm Based On Grid Point Density Estimation

Posted on:2020-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330596987331Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research of machine learning algorithms is a significant branch in the field of artificial intelligence,which involves the cross-fusion of many disciplines.The object of machine learning algorithms is how to simulate human behavior to learn new knowledge so that it can update knowledge structure and improve the performance of its own algorithms.In recent years,the research in machine learning has made great progress and various machine learning algorithms have also been proposed.Machine learning algorithms are usually divided into three categories: supervised learning algorithms,unsupervised learning algorithms,and semi-supervised learning algorithms.The clustering is one of the most representative unsupervised machine learning algorithms.According to certain characteristics,similar data points in the data set are divided into the same cluster and the non-similar data points are divided into different cluster by the algorithm.Although a variety of clustering algorithms have been proposed,most traditional clustering methods can only be applied to the clustering of spherical data and the clustering results may be affected by parameter setting and initialization.In addition,when the number and dimensions of data points become very large,the efficiency of the clustering algorithm will be limited by time complexity and spatial complexity.Therefore,a fast and robust grid-based clustering method is proposed in this paper,which can identify clusters with arbitrary shapes.The algorithm can also be used to cope with large data sets.In the improved method,firstly,the number of divided grid can be automatically determined by using a given formula.Then,the algorithm calculates the densities of the grid nodes instead of the traditional grid densities.Finally,the classical breadth-first search algorithm is used to perform clustering operations based on the densities of the grid nodes.Experiments on multiple artificial datasets and real datasets show that this method is more efficient and effective than traditional clustering methods.In addition,the values of clustering evaluation indexes usually need to be calculated to evaluate the clustering results.The traditional point-to-point comparison method is less efficient to get the evaluation indexes of big datasets.In this paper,the method of calculating the clustering result evaluation indexes by using the confusion matrix is given.The experimental result shows that the efficiency of obtaining the value of evaluation index can be obviously improved by this method.
Keywords/Search Tags:grid-based clustering, grid nodes, breadth-first search, clusters with arbitrary shapes, clustering result evaluation
PDF Full Text Request
Related items