Research On DBSCAN Algorithm Under Non-Independent And Identical Distribution

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Lv

Full Text:PDF

GTID:2428330602997178

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering method is an important part of data mining.It is a very challenging research field that its purpose is to gather similar data objects together and separate dissimilar data objects as far as possible.DBSCAN is a density-based clustering algorithm with superior performance.The algorithm divides the region with sufficient density into data cluster,and the data cluster is the maximum set of data objects connected by density.The obvious advantage of the algorithm is that the clustering speed is fast,the noise points can be processed effectively,and the spatial clustering of arbitrary shapes can be found.However,there are some problems that because of the traditional DBSCAN algorithm assumes that the data objects and attributes are IID(independent and identically distributed),the traditional distance formula is difficult to accurately measure the similarity between category data objects and attributes,and the parameter sensitivity is difficult to determine.In order to solve the above problems,this paper studies the DBSCAN algorithm under non-independent and identical distribution.The unsupervised clustering problem of categorical data is solved by using the idea of non-independent and identical distribution.That is,the similarity between data objects and attributes is calculated by using the coupled and similarity formula of non-independent and identical distribution to output the coupled and similarity by the way of matrix.Neighborhood interval values(neighborhood interval lower limit Eps1 and neighborhood interval upper limit Eps2)and thresholds are used to divide the high-density data sets,so that higher quality clustering results can be obtained more quickly.For common categorical data,A Non-IID DBSCAN algorithm(DBSCAN under Non-Independent and Identical Distribution)is proposed.The array Rm is obtained by ascending sort the coupling similarity of the largest coupling similarity data object Om with other data object.The visualization method of array Rm is used to select the neighborhood interval lower limit Eps1,which is being observed that the curve first rises slowly and then leveled off,and finally it suddenly steeper somewhere.Then,the value of the coupling similarity corresponding to the sudden steepening of the curve is set as neighborhood interval lower limit Eps1.At the same time,find the k-nearest neighbor distance value to the right of Eps1(k takes the value of Minpts),and the maximum value was set as the neighborhood interval upper limit Eps2.The density formula is used to determine the sparsity density of the data,that is,the parameter threshold Minpts is set according to the size of the density value.Generally,if the density value is large,the threshold Minpts value is 4,if the density value is small,the threshold Minpts value is 2.For categorical data containing Boolean data,a NIB-DBSCAN algorithm(DBSCAN under Non-Independent and Identical Distribution for Boolean data)is proposed.By using scatter plots to fit the distribution of coupling similarity between data objects and attributes,find the boundary point between data cluster and data cluster,and set the value of their corresponding coupling similarity as neighborhood interval lower limit Eps1.For data sets with a small number of data clusters,the weighted average was used to select the parameter threshold Minpts.For data sets with a large number of data clusters,the parameter threshold Minpts is selected using the special value method.Finally,the experimental results of the UCI data sets show that DBSCAN under the non-independent and identical distribution can obtain higher-precision clustering results and improve the applicability of the algorithm.

Keywords/Search Tags:

Non-IID, Coupling relationship, DBSCAN algorithm, Coupling similarity degree matrix, Principle of statistics

PDF Full Text Request

Related items

1	Research And Application On Design Of Multi-performance Of Complex Product Based On Fuzzy Coupling Degree
2	Synthesis And Design Of Microwave Filters
3	Researches On Recommendation Algorithm Based On User And Item Attribute Information
4	Theoretical Research And Application Of Wideband Coupling Matrix
5	Synthesis And Design Of Cross-Coupling Resonator Filter With Source-Load Coupling
6	Study On The Energy Coupling Relationship Between ESD And PCB
7	Theoretical Research On The Planar Microwave Filter In Box-section Coupling Topology And Its Application
8	Coupling Analysis And Its Application In Knowledge Analysis
9	Design Of SIW Bandpass Filter Based On Optimization Of Lossy Coupling Matrix
10	IFFT Algorithm Based On Mutual Coupling Compensation Matrix And Its Application