Font Size: a A A

Improvement Of DBSCAN Algorithm Based On Adaptive Estimation Of Eps Parameters And Its Application In Outlier Detection

Posted on:2020-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2370330572480091Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The 21st century is an era of science and technology and data explosion.The scale of data is getting larger and larger.How to effectively dig out valuable information from large-scale data has become an important issue.As an important research direction in the field of data mining,clustering algorithm has gradually developed into a very active research topic,which is widely used in the fields of bioinformatics,medicine,business and marketing,social network analysis,computer science and other information analysis fields.Among them,DBSCAN(Density based spatial clustering of applications with noise)is a very important algorithm.It was proposed by Martin Ester,Hans-peter Kriegel and others in 1996.It can find isolated data points or clusters of various shapes and sizes in noise data.In 2014,the algorithm won the time test award(a reward for algorithms that are of high theoretical and practical interest)at the highest quality conference in data mining(KDD).Because of DBSCAN algorithm need to enter the Eps and minPts two parameters,and these two parameters are usually needed to determine the people's subjective consciousness to give,cause the clustering accuracy is determined directly by the user's prior knowledge,and USES a unified global parameters,the concentration of non-uniform density clustering effect is not ideal,in view of the defects of the algorithm,this paper puts forward a new thinking and method of the solution.Firstly,this paper USES gaussian kernel density estimation method to calculate the kernel probability density estimation of data points,and through the positive correlation between kernel probability density estimation and Eps value,adaptively calculates and matches the appropriate neighborhood search radius of Eps for each data point.In order to evaluate the effect of the improved algorithm,a variety of different types of data sets were used for comparative experimental analysis in this paper.The results show that the proposed method has a good clustering effect and effectively improves the clustering accuracy.Finally,because the DBSCAN algorithm to cluster cluster data points outside the identification of noise points,so this article use the algorithm to detect outliers,and to assess the efficiency of detecting outliers,this article USES the real data sets and simulated data sets experiment many times,by comparing the current commonly used several kinds of outliers detection algorithm,this algorithm to detect outliers in the good effect.
Keywords/Search Tags:DBSCAN algorithm, EPS parameters, Gaussian kernel probability, Outlier detection
PDF Full Text Request
Related items