| Clustering is an unsupervised machine learning technique that is an important means of obtaining information and knowledge from unknown labels.It is an important research field in data mining and plays a significant role in big data processing.Therefore,clustering algorithms have received widespread attention from the academic and business communities.As a density based clustering algorithm,the density peak clustering algorithm has been widely used in various fields due to its characteristic of not needing to specify the number of clusters in advance and being able to cluster in any shape.However,this algorithm neglects the global nature of the data and is susceptible to subjective human interference when selecting cluster centers,resulting in a chain reaction of points and poor clustering accuracy of the density peak clustering algorithm;In the process of data denoising,the density peak clustering algorithm can greatly reduce the accuracy and accuracy of clustering due to its inability to accurately detect boundary noise points and noise points in sparse areas of data.In response to the shortcomings of density peak clustering algorithms,thesis proposed a density peak clustering algorithm based on fuzzy membership degree optimization.The algorithm mainly included two parts: density based noise detection method and improved density peak clustering algorithm.The specific content of these two parts was as follows:(1)Thesis proposed a density based noise detection method to address the issue of sensitivity of density peak clustering algorithms to data boundary noise points and sparse data area noise points.The purpose of thesis method was to remove noise points from the data.The first step was to define the calculation method of sample point density within the data based on the characteristics of density clustering,and then determined a threshold to detect noise points within the data through sample point density,removing the detected noise points;The second step was to fuse the relative density method to detect sparse boundary areas and noise points hidden within class clusters,completing the noise removal work.Thesis method not only effectively removes noise points in the data boundary area,but also effectively removes noise points hidden within the class cluster in the sparse data area,providing a high-quality data foundation for clustering algorithms.(2)Thesis proposed an optimized density peak clustering algorithm to address the issue of point chain reactions in density peak clustering algorithms.Due to the fact that the density peak clustering algorithm mainly divided high-density and low-density areas based on local density and high-density nearest neighbor distance when selecting cluster centers,the calculation of local density was affected by subjective human interference.Once the cluster center was selected incorrectly,it will cause chain reactions of points.Therefore,thesis introduced the principle of fuzzy mathematical membership function to optimize the selection of cluster centers,dividing the allocation strategy into three stages,That was,the determination of the cluster center,the allocation of points,and the merging stage of the cluster.The determination stage of cluster centers first excluded non cluster center points,replaced local density with sample point density,established candidate cluster center points in the decision graph,expanded the range of cluster center points,and finally selects the set of cluster center points.In the stage of point allocation,the cluster center neighborhood fuzzy set was first established according to the cluster center neighborhood,and then the cluster center neighborhood fuzzy set was optimized according to the membership function,which was divided into several small clusters.The merging stage of clusters was achieved by calculating the similarity between cluster centers and merging the small clusters divided by the point allocation stage to complete the clustering process.Thesis stage not only ensured the accuracy of the final cluster division,but also greatly reduced the generation of chain reactions of points and improved clustering accuracy.In order to verify the effectiveness of the algorithm proposed in thesis,comparative experiments were conducted on six public datasets with mainstream clustering algorithms such as DPC,DBSCAN,3DC,KDPC,FCM,etc.The experimental resultes show that the algorithm proposed in thesis outperforms the other five algorithms in terms of RI and Acc metrics,and has better clustering performance on datasets with boundary noise.It has achieved significant performance improvement on small-scale and sparse datasets.The algorithm proposed in thesis unifies the measurement method of cluster center selection,and solved the problems of chain reaction caused by artificial selection of cluster center for point allocation and sensitivity to boundary point noise and data sparse area noise. |