| Clustering algorithm is an important data mining algorithm with important applications in many fields,especially in the research of web content security in the field of information security,where clustering-based image segmentation algorithms are often used.As an important data mining method,the purpose of clustering algorithm is to divide a data set into different classes according to certain criteria,so that the similarity between objects of the same class is as large as possible and the similarity between objects of different classes is as small as possible.However,with the development of science and technology,the amount of data becomes larger and larger,and the data collection methods and data types become more and more complex,so the traditional clustering algorithm cannot get the ideal clustering results in the face of complex data.Many scholars have improved clustering algorithms from the perspective of data division,such as DBSCAN clustering and density peak clustering.However,most of these methods still use the Euclidean distance as the inter-sample similarity measure.In this paper,we propose a new similarity measure from improving the similarity measure of sample points to address the shortcoming that Euclidean distance as a similarity measure does not reflect the global consistency of non-convex data sets well,and then try to combine this measure with the existing clustering algorithm to apply it to the field of image segmentation.In this paper,a new similarity measure is proposed to improve the inter-sample similarity measure,to address the shortcoming that Euclidean distance as a similarity measure does not reflect the global consistency of non-convex data sets well,and then try to combine this measure with clustering algorithms to apply it to the field of image segmentation.The main work of the paper is as follows:(1)The current research status of clustering algorithms is summarized,and the improvement strategies of current research on traditional clustering algorithms are analysed.On this basis,the clustering process is divided into two processes: similarity measure and sample point delineation,and it is pointed out that most of the current mainstream clustering algorithms adopt the improvement of the sample point delineation process but neglect the optimization of the similarity measure.In view of the above characteristics,this paper proposes a method to calculate the distance between two points on a manifold based on density and nearest neighbours by constructing a chain of nearest neighbours,which defines the distance between samples by iteratively searching for a chain of nearest neighbours.This method targets datasets with non-convex structure and can well reflect their local and global consistency.In order to verify the effectiveness of the method,chain distances were applied to the Kmedoids and Affinity Propagation clustering algorithms,and the experiments achieved better results when compared with clustering results using different distance metrics on both the artificial and UCI datasets.(2)A two-stage clustering method based on SLIC superpixel segmentation is proposed to segment the images.The method first completes the superpixel segmentation by the SLIC algorithm and then completes the clustering of the superpixel blocks by combining the chain distance metric and the clustering algorithm to complete the segmentation of the images.The experiment were conducted using DBSCAN clustering based on chain distance and DBSCAN and K-Means clustering based on Euclidean distance to segment several images in the BSDS500 image segmentation and edge detection datasets,demonstrating the availability of chain distance in this field. |