Font Size: a A A

Outlier Detection Algorithm Based On Entropy Weight Distance And Density Peak Clustering

Posted on:2023-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:W X LiuFull Text:PDF
GTID:2568306848467354Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information age,thousands of data are being generated at every moment in life.How to use data mining technology to efficiently extract the value contained in data has become an important research direction.Among them,outlier detection,as an important technology in the direction of data mining,has a wide range of applications in the fields of industrial equipment failure detection,intelligent medical diagnosis,and fraud detection.Outlier detection technology can detect data objects that are obviously different from most data objects and contain great value from the data.In this dissertation,the density-based outlier detection algorithm reduces the detection accuracy of local outliers on some data sets with different densities,and the density peak clustering algorithm has low time efficiency and is easy to suffer when facing large-scale data sets.The problem of the influence of parameter settings has been deeply studied,and the main research contents are as follows.First,analyze the current research status of density-based outlier detection algorithms.Aiming at the problem that density-based outlier detection algorithms are sensitive to the parameter k in density estimation,this dissertation introduces the natural neighbor algorithm to adaptively obtain the value of parameter k.In addition,facing the problem of low accuracy of density-based outlier detection algorithms on some higher-dimensional data sets,the probability of entropy weight distance is introduced to replace the traditional Euclidean distance to improve the adaptability of the algorithm.Second,this dissertation proposes the concept of relative distance to improve the detection accuracy of the algorithm on some data sets with different densities,and then combines kernel density estimation and relative distance to propose a relative entropy weight density outlier factor to describe the degree of outlier of the data object Finally,an outlier detection algorithm based on relative entropy weight density outlier factor is proposed.Secondly,the density peak clustering algorithm has been studied in depth.Aiming at the problem that density peak clustering requires manual setting of parameters,this dissertation uses the k-nearest neighbor algorithm to replace the traditional density estimation in density peak clustering,and uses density and distance.The method of product automatically selects the cluster center.In order to improve the time efficiency of the algorithm,an index structure is used to optimize the distance calculation.Secondly,in order to improve the detection accuracy of the algorithm on some data sets with different densities,the centripetal relative distance is proposed.In order to characterize the outlier degree of data objects,an outlier factor based on fast density peak clustering is proposed,and finally an outlier detection algorithm based on fast density peak clustering is proposed.Finally,in order to verify the effectiveness of the algorithm proposed in this dissertation,the algorithm is verified by experiments on artificial data sets and real data sets,and some classic and novel algorithms are compared with experiments to verify that the algorithm proposed in this dissertation can be more stable and more stable.Efficiently detect outliers.
Keywords/Search Tags:data mining, outlier detection, information entropy, kernel density estimation, density peak clustering
PDF Full Text Request
Related items