Font Size: a A A

Exploration Of Density-Based Clustering On High-Dimensional Data And Its Applications

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:W J DuanFull Text:PDF
GTID:2297330503961383Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Clustering by fast search (FSC) proposed by Ales and Anlessandro is based on the idea that cluster centers have a higher density than their neighbors and a relatively large distance from points with higher densities. This method will clustering datasets through a measurement of density and relative distances of the objects. There isn’t any iteration among the method of FSC which means more efficiently to identify of the cluster number and cluster centers. However, it isn’t good at high-dimensional data because of the "curse of dimensionality". Consid-ering about how to deal with high-dimensional data more efficiently, we propose two improved methods based on the FSC method with the idea of dimensionality reduction methods. Clustering by fast search based on principal component anal-ysis clustering the new datasets generated by principal components which satisfy a limited interval one by one, and then selecting a set of principal components which have the best clustering output. Clustering by fast search based on hard threshold function measuring the similarity distance function with hard threshold function, where only the variables satisfying some certain conditions are considered. The results of numerical simulations and real data like Face, Iris and Wine data il-lustrate that the proposed methods have good performance on high-dimensional data analysis.
Keywords/Search Tags:Density-based clustering, dimensionality reduction methods, hard threshold function, high-dimensional data
PDF Full Text Request
Related items