Font Size: a A A

Research On Anomaly Detection Method Based On Clustering

Posted on:2023-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2568306836464694Subject:Engineering
Abstract/Summary:PDF Full Text Request
Anomaly detection is a very important research branch of data mining technology.Its essential idea is to mine objects that are significantly different from most objects.With the advent of the era of big data,various industries have produced a large amount of data.There are often anomalies in this data,and anomalies usually contain more valuable information.Previously,anomalies in data sets were labeled by industry experts in various fields.Labeling exceptions requires a lot of energy and experience,and it is very difficult to accurately label anomalies.Unsupervised anomaly detection does not require pre-labeling of the data.Therefore,this kind of anomaly detection method has more research significance.To solve this problem,this paper focuses on clustering and anomaly detection technology in data mining.The main work includes:1.In order to solve the problem that the edge points(including outliers and edge points that misdivide other clusters)affect the traditional K-means clustering,an improved Kmeans algorithm is proposed.The algorithm adaptively detects the edge points by the local threshold obtained from the distance distribution of the data points in the cluster to the centroid.The edge points are ignored in the clustering process to avoid the influence of the edge points on the clustering results.Based on the above algorithms,an anomaly detection method based on improved K-means algorithm is proposed.This method moves the point in each cluster that is farthest from the centroid into the abnormal cluster.At this time,the normal cluster will be more compact,and the abnormal cluster will be more loose.When all clusters reach the steady state,the algorithm stops and the detection result is obtained.This method retains the ease of use of traditional K-means,reduces the number of clustering iterations,and can accurately detect anomalies in the data set.The experimental results show that the algorithm achieves the desired effect.It can effectively detect outliers and has good results on multiple data sets.2.In order to solve the problem that the traditional K-means algorithm is not applicable for non-convex shape data sets,a hybrid clustering algorithm is designed by combining the above improved K-means clustering algorithm with the idea of the AGNES(AGglomerative NESting)clustering algorithm.The hybrid algorithm solves the problem that the traditional K-means algorithm is not suitable for non-convex shape data sets,and alleviates the problem of large amount of calculation of the AGNES algorithm.In order to solve the problem of how to choose the k value in the traditional K-means algorithm,this paper uses the DAS(Difference of Average Synthesis Degree)indicator while merging the clusters in the AGNES algorithm to evaluate each cluster division,avoiding the problem that the algorithm needs to set the number of clusters in advance.On this basis,an anomaly detection method based on the hybrid clustering algorithm of the improved K-means algorithm and the AGNES algorithm is proposed.In this method,clusters with very few data points are regarded as abnormal clusters,which alleviates the problem of selecting the abnormal points as the initial center.Then the local outlier score is calculated for the data points of the normal cluster,which makes the anomaly detection method more applicable in non-convex shape data sets.Experimental results show that the algorithm can adapt to a variety of shape data sets,and its performance is relatively stable.
Keywords/Search Tags:Anomaly Detection, Clustering Algorithms, Unsupervised, K-means Algorithm, AGNES Algorithm
PDF Full Text Request
Related items