Font Size: a A A

High-Dimensional Data Clustering Based On Hypergraph Partition

Posted on:2019-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2348330569989990Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is a quite important topic.With the development of modern technology,the structure of data is more and more complex and the dimension of data also become bigger.The growth of data dimension brings several challenges to traditional clustering algorithms.For high-dimensional data,due to the influence of dimensionality,the similarity measure used in traditional clustering algorithm is not significant in the high dimensional space.With the existence of high dimensional data,clustering for high dimensional data has become an important direction of data clustering analysis,a lot of methods for clustering high dimensional data have been proposed.Clustering method based on dimension reduction is an effective means for dealing with high dimensional data clustering.By reducing the high dimensional data to lower dimensions,the main structural features of the data are also retained in the process of reducing the data dimension,then the classical clustering algorithm is used to cluster the data.Data dimensionality reduction technology has been developed rapidly in recent years.The classical dimensionality reduction algorithms are PCA,LLE,SNE,Autoencoder and so on.Different methods are suitable for different data structures.At the same time,many properties of high dimensional data are redundant,the data structure is determined only by a small number of main features.This inspired the generation of subspace clustering.Subspace clustering is to find the embedding feature in the original data space and study the process of clustering in these subspaces.Hypergraph partitioning is considered as an effective method for high dimensional data clustering.In this paper,a new method called MDSG(Merging Dense SubGraphs)is proposed for high dimensional data clustering.Firstly,a shared nearest neighbor graph G is constructed by using Share Nearest Neighbor(SNN)method.Then a hypergraph is constructed by defining the maximal clique in graph G as the hyperedge of hypergraph.Finally,an improved hypergraph partition method is used to get the final clustering results.In this paper,several real high-dimensional datasets are used to evaluate the proposed high dimensional clustering method.The several experimental results show that the proposed MDSG method is superior to the traditional clustering method and other hypergraph partitioning methods for high dimensional data clustering.
Keywords/Search Tags:Clustering, High dimensional data, Hypergraph, Hypergraph partition
PDF Full Text Request
Related items