The Outliuer Detection Algorithm Based On Similarity Pruning And Data Field Potential

Posted on:2021-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Shi

Full Text:PDF

GTID:2568306104971259

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Outlier detection is one of the hot issues in data mining.It can mine a small amount of data with important value information from a large number of data,and has a very wide range of applications in real life.Therefore,outlier detection has attracted the attention and research of many scholars at home and abroad,and many outlier detection methods have been proposed.In this paper,the density based outlier detection method is analyzed in depth,and the improvement strategy is proposed to improve the detection accuracy of outliers.The main content of this paper is divided into the following three parts.Firstly,this paper analyzes the parameter of LOF algorithm for the data set with unknown number of outliers,and the problem of low detection accuracy on the data set with uneven density distribution and irregular shape,and proposes an outlier detection method based on similarity pruning and neighborhood density.The algorithm solves the parameter problem by introducing neighborhood density.The concept of similarity is proposed to describe the degree of similarity between data objects,reflect the overall distribution of data objects,and prune data objects more accurately.The mathematical concept of intersection is used to improve the accuracy of outlier detection and reduce the false alarm rate.Secondly,this paper analyzes LDC,a local outlier detection algorithm based on the density of scattered data,affirms the concept of the degree of dispersion in the algorithm,but there is a problem that the detection accuracy is not high in the data set with only a small part of scattered data,so an outlier detection algorithm based on the expectation and mean square deviation of data field potential value is proposed.The algorithm preprocesses the data set by the improved DBSCAN clustering algorithm,and solves the parameter problem of the original DBSCAN clustering algorithm.According to the characteristics of data field,the concept of dispersion degree is re characterized by the expectation and mean square deviation of potential value of data field.The algorithm can not only solve the situation that lof algorithm misjudges in the scattered data set,but also has better detection results than LDC algorithm.Finally,the algorithm is validated on UCI real data set and synthetic data set,and compared with the existing algorithm.The experimental results verify the effectiveness of the two algorithms proposed in this paper.

Keywords/Search Tags:

data mining, outlier detection, neighborhood density, similarity degree, data field potential, dispersion degree

PDF Full Text Request

Related items

1	Research And Improvement Of Local Outlier Detecting Algorithm Based On Density
2	Research Of Detection Outlier Based On Outlier Degree
3	Improvement Of Density-Based Local Outlier Detection Algorithm
4	Analysis And Research On Density-based Local Outlier Detection
5	Research On Outlier Mining Method Oriented To Multidimensional Data
6	Researches On Outlier Detection Algorithms For Categorical Matrix-object Data
7	Research On Outlier Detection Method Based On Nearest Neighborhood
8	The Research And Implementation Of The Evaluation System For The Degree Of Production Accident Destruction Based On Data Mining
9	Research And Implementation Of Clustering And Outlier Detection Algorithms
10	Research On Technology For Detecting Density-based Outlier