Font Size: a A A

Research On Density Peak Clustering Algorithm And Its Application In Outlier Detection Of Power Big Data

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2392330629450140Subject:Power Engineering
Abstract/Summary:PDF Full Text Request
The density peak clustering algorithm is a density-based clustering algorithm.Its density peak depends on the density-distance model to determine it.It can handle arbitrary shapes of clusters and is a simple and efficient clustering algorithm.However,there are still several defects in this algorithm:(1)The truncation distance needs to be manually selected and lacks a certain theoretical basis.(2)The method of defining local density has certain limitations,resulting in poor clustering effect when there are large differences in the density of different cluster samples in the data set.(3)The density peak clustering algorithm is difficult to deal with data sets with manifold features,which are very common in real data sets.This paper studies the above problems and proposes corresponding improvement plans:(1)For the data processing of density peak clustering algorithm,artificial input truncation distance is required,and a large number of prior experiments are required to determine the parameters of different data sets.A density peak clustering algorithm for firefly optimization is proposed.The algorithm uses density estimation entropy to evaluate the deterministic relationship between data,and uses the Firefly algorithm to iteratively find the minimum truncation distance of entropy,and brings it into the standard density peak clustering algorithm to cluster.In order to avoid the artificial setting parameters without basis,according to different data set adaptive selection parameters.(2)Aiming at the defects in the local density definition of density peak clustering algorithm,a density peak clustering algorithm based on cosine kernel is proposed.The cosine kernel function uses the local information of the data set to define the local density of the sample.It can find the difference in the position of different samples within the truncation distance,and at the same time balances the influence of the cluster center point and the boundary point on the local density of the sample.(3)The density peak clustering algorithm adopts European distance as a measure of similarity between samples,and it is difficult to obtain a good clustering effect when dealing with manifold data sets.In view of this,a density peak clustering algorithm based on Geodetic distance and dynamic domain is proposed.The similarity of Geodetic distance was measured and the number of nearest neighbors of Geodetic distance was adjusted according to the spatial distribution of the sample.This measurement method solves the clustering problem of manifold data sets and can effectively cluster the data sets that exist simultaneously with sparse and dense clusters.Based on the advantages of density peak clustering algorithm,a criterion for determining outliers is designed.Based on the example,the abnormal value of load data in large power data is detected,which provides a theoretical basis for the abnormal processing and analysis of large power data.
Keywords/Search Tags:Density peak clustering, Firefly algorithm, Cosine kernel, Geodetic distance, Power Big Data
PDF Full Text Request
Related items