| As one of the most important algor ithm in data m ining,clustering analys is has been research for a long time.DBS CAN algorithm is a density-based spatial clustering algor ithm,which main idea is that regarding a sufficient density region as one cluster,arbitrary shape clusters can be found from dataset with noise,and the cluster is defined as maximum set which contain the density connected of the points.The traditional DBSCAN algor ithm is very sensitive to the parameters: Eps and MinPts,which are difficult to determine.Based on the study of DBSCAN clustering algorithm and kNN classification algor ithm,we proposes a kNNSCAN algor ithm,which effectively solves the parameters dependences of DBSCAN.Firstly,the proposed kNNSCAN algorithm is designed to solves the problem of clustering algor ithm dependence on parameters.This algorithm research on the distance between the kth nearest neighbor and its nearest neighbor,and combines points that satisfy a certain relat ions.In order to elim inate the inf luence of parameter dc on clustering results,we proposes an improved M-kNNSCAN algor ithm.In the cluster dataset,kNNSCAN will decrease the clustering accuracy when the value of parameter dc have an inaccurate selection.M-kNNSCAN algor ithm combines the inaccurate clustering results though the decision graph of cluster center points,which effectively solves the decrease problem of clustering quality when have a wrong parameter dc.Secondly,based on kNNSCAN algorithm,we propose PR-kNNSCAN algor ithm.This algor ithm further optimizes the parameters of kNNSCAN,which can effectively ident ify clusters of arbitrary shapes and greatly weaken the influence of noise points on clustering result.The algorithm is verified to be feasible and effective to calculate the density of points by experiments.Finally,we compares several common clustering algor ithms with two algorithms proposed in this thesis,the experimental results show that M-kNNSCAN and PR-kNNSCAN algorithm have good performance in clustering. |