Exploration Of Density-Based Clustering On High-Dimensional Data And Its Applications

Posted on:2017-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:W J Duan

Full Text:PDF

GTID:2297330503961383

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Clustering by fast search (FSC) proposed by Ales and Anlessandro is based on the idea that cluster centers have a higher density than their neighbors and a relatively large distance from points with higher densities. This method will clustering datasets through a measurement of density and relative distances of the objects. There isn’t any iteration among the method of FSC which means more efficiently to identify of the cluster number and cluster centers. However, it isn’t good at high-dimensional data because of the "curse of dimensionality". Consid-ering about how to deal with high-dimensional data more efficiently, we propose two improved methods based on the FSC method with the idea of dimensionality reduction methods. Clustering by fast search based on principal component anal-ysis clustering the new datasets generated by principal components which satisfy a limited interval one by one, and then selecting a set of principal components which have the best clustering output. Clustering by fast search based on hard threshold function measuring the similarity distance function with hard threshold function, where only the variables satisfying some certain conditions are considered. The results of numerical simulations and real data like Face, Iris and Wine data il-lustrate that the proposed methods have good performance on high-dimensional data analysis.

Keywords/Search Tags:

Density-based clustering, dimensionality reduction methods, hard threshold function, high-dimensional data

PDF Full Text Request

Related items

1	Model-Based High-Dimensional Data Clustering Methods:A Review
2	Local Linear Embedded LLE Method For Nonlinear Dimension Reduction Based On High Dimensional Space
3	Research On Dimensionality Reduction Classification Of T-SNE Combined With Support Vector Machine
4	A Design Of Clustering Mining Algorithm Distinguishing The Multi-dimensional Based On Grid Density
5	Research On K-prototypes Clustering Algorithm And Data Dimension Reduction
6	The Research On Clustering Of Mixed Data Stream Based On DPC Algorithm
7	Dimensionality Reduction Method For Interaction Model Based On Two-stage Sliced Inverse Regression
8	The Prediagnosis Comparison Of ALF Based On Three Statistical Algorithms With Built-In And External Dimensionality Reduction And Stacking Integration
9	The Study And Application About Statistical Methods Of Data Reduction
10	Dimensionality Reduction Based On Feature Selection