Font Size: a A A

Research On Clustering Algorithm Based On Graph

Posted on:2022-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2480306755472634Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the Internet,a large amount of data has been generated in life,and we need to process and analyze these data to obtain effective information.Clustering is one of the most important tasks in the fields of data mining and machine learning and is an unsupervised learning method.The purpose of clustering is to divide the data set into several clusters(called "classes"),so that the similarity of data points within the same cluster is high,while the similarity of data points between different clusters is low.The graph can intuitively reflect the correlation between data points and has the property of structure preservation,so the original structure of the data can be kept unchanged under the condition of obtaining effective information.In recent years,graph-based methods of this type have been widely used in clustering tasks.In order to obtain better clustering performance,a clustering algorithm based on three different graphs is proposed.The main research contents include the following aspects.(1)k-means clustering is fast,simple and scalable.Spectral clustering is based on spectral graph theory and can be clustered in any shape of the sample space.In graph theory,a Laplacian matrix is a matrix representation of a graph.Combining the advantages of both,a k-means clustering algorithm based on graph regulation is proposed,which adds graph Laplacian conditioning to the objective function of k-means expansion.In a single optimization process,the objective functions of k-means and spectral clustering are simultaneously optimized,and the cluster centers and cluster indicators are iteratively updated with the multiplication update rule,and the correctness and convergence of the algorithm are proved theoretically in detail.Corresponding experiments are carried out on typical datasets to verify the superiority of the proposed algorithm.(2)The time complexity of k-means is close to linear,which is suitable for mining large-scale data sets,but it can not deal with nonlinear data.Spectral clustering can cluster on the sample space of arbitrary shape and converge to the global optimum,but the computational complexity is high.A joint learning of k-means and spectral clustering based on bipartite graph is proposed for large-scale data clustering.The algorithm divides the data into the corresponding manifold space and Euclidean space at the same time,inherits the advantages of k-means and spectral clustering,and avoids their shortcomings.In addition,an adaptive bipartite graph is established between the data points and the sampling sites by sampling the input data.The introduction of the bipartite graph reduces the amount of computation in the subsequent optimization process,overcomes the shortcoming of high complexity of spectral clustering,reduces the time complexity.Experiments are carried out on datasets of different scales,and the results show that the proposed algorithm can achieve better clustering performance.(3)Nonnegative matrix factorization(NMF)is widely used in clustering tasks,it aims to find two nonnegative matrices whose product approximates the original matrix well.In large-scale data clustering tasks,this leads to reduced clustering performance due to the large size and complexity of the constructed graph matrix.In order to solve this problem,we propose a non-negative matrix factorization clustering model based on large graph embedding,which constructs a sparse adjacency large graph that captures the local manifold structure information of all data points and improves the clustering performance.In addition,the convergence of the proposed algorithm is proved,and experiments on different types of data sets are carried out to verify the effectiveness of the proposed algorithm.
Keywords/Search Tags:k-means, spectral clustering, graph regulation, bipartite graph, non-negative matrix factorization
PDF Full Text Request
Related items