Font Size: a A A

The Outliuer Detection Algorithm Based On Similarity Pruning And Data Field Potential

Posted on:2021-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ShiFull Text:PDF
GTID:2568306104971259Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outlier detection is one of the hot issues in data mining.It can mine a small amount of data with important value information from a large number of data,and has a very wide range of applications in real life.Therefore,outlier detection has attracted the attention and research of many scholars at home and abroad,and many outlier detection methods have been proposed.In this paper,the density based outlier detection method is analyzed in depth,and the improvement strategy is proposed to improve the detection accuracy of outliers.The main content of this paper is divided into the following three parts.Firstly,this paper analyzes the parameter of LOF algorithm for the data set with unknown number of outliers,and the problem of low detection accuracy on the data set with uneven density distribution and irregular shape,and proposes an outlier detection method based on similarity pruning and neighborhood density.The algorithm solves the parameter problem by introducing neighborhood density.The concept of similarity is proposed to describe the degree of similarity between data objects,reflect the overall distribution of data objects,and prune data objects more accurately.The mathematical concept of intersection is used to improve the accuracy of outlier detection and reduce the false alarm rate.Secondly,this paper analyzes LDC,a local outlier detection algorithm based on the density of scattered data,affirms the concept of the degree of dispersion in the algorithm,but there is a problem that the detection accuracy is not high in the data set with only a small part of scattered data,so an outlier detection algorithm based on the expectation and mean square deviation of data field potential value is proposed.The algorithm preprocesses the data set by the improved DBSCAN clustering algorithm,and solves the parameter problem of the original DBSCAN clustering algorithm.According to the characteristics of data field,the concept of dispersion degree is re characterized by the expectation and mean square deviation of potential value of data field.The algorithm can not only solve the situation that lof algorithm misjudges in the scattered data set,but also has better detection results than LDC algorithm.Finally,the algorithm is validated on UCI real data set and synthetic data set,and compared with the existing algorithm.The experimental results verify the effectiveness of the two algorithms proposed in this paper.
Keywords/Search Tags:data mining, outlier detection, neighborhood density, similarity degree, data field potential, dispersion degree
PDF Full Text Request
Related items