Font Size: a A A

Research On Spectral Clustering Algorithm In Data Mining

Posted on:2011-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X SunFull Text:PDF
GTID:2178330332463510Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is a very active research focus in the international field of data mining and machine learning, it is the effective means for people to understand and explore the intrinsic link between things. Spectral clustering algorithm is a new kind of clustering algorithms, compared with the traditional clustering algorithms, spectral clustering has many obvious advantages, this method is not only simple, easy to implement, uneasy to fall into local optimal solution, but also can recognize distribution of non-convex clustering and do clustering on arbitrarily shaped sample space, it is very suitable for many practical applications.The traditional spectral clustering algorithms first define a similarity measure between two data points, structure a similarity matrix W of data points basing on the similarity measure, calculate the Laplace matrix L, and then calculate the eigenvalues and feature vector of L, and finally select one or more feature vectors to do clustering. Among this, when we build the similarity matrix W, we use the Gaussian kernel function for the similarity function, the scale parameters are set manually, that bring some limitations to the algorithm.So Designing a new spectral clustering algorithm without the need to manually enter scale parameter has a very important theoretical and practical significance, it not only help researchers to deep study spectral clustering in the data mining area, but also help general engineers using spectral clustering algorithm to solve practical problems in the real world.In this paper, a detailed analysis of theory and methods about spectral clustering algorithm is given, and then it probes into the reason why spectral clustering algorithm works and its advantages which are superior to traditional algorithms. After that, it points out the current problems of spectral clustering algorithm. At last, an introduction to the theory and constructor of the NJW spectral clustering algorithm is followed. Now there are the main two works I will finish in this paper as follows.First, a detailed analysis of theory and methods about spectral clustering algorithm is given, Through deep studying the classical NJW spectral clustering algorithm, for NJW algorithm's existing problems that manually inputing the scale parameters, this paper does the corresponding improvement, the research goal is to conduct a new method to automatically optimize the value of scale parameter. This method is implemented in matlab 7.0 platform, We use the UCI standard data set to do clustering and compare the experimental results of k-means,NJW and EBSC algorithms, the result proves that the EBSC algorithm is superior to the k-means andNJW algorithm in clustering accuracy.Second, in this paper the spectral clustering applied to the division of tobacco quality in the tobacco industry is discussed. In the field of tobacco, many data mining technology and computer intelligent methods have been used to solve problems. And have achieved some results.Currently, the traditional clustering methods have been used to solve the problems, such as center-based clustering algorithm (for example, the classic k-means algorithm), it is effective in a compact ultra-spherical distribution of the data sets, but it is not suitable for arbitrary shape clustering, and such algorithm is the use of iterative optimization method to find the optimal solution, which is easy to fall into local optimal solution, so these algorithms can not guarantee convergeing to the global optimal optimum solutions. The spectral clustering can find arbitrary shape clustering and can converge to global optimal solution, so it provides a new idea for division of tobacco quality. In this paper,â… used EBSC algorithm to division of tobacco quality, the leaf cluster can give a certain guidance significance to the purchase of the tobacco, at the same time, during formula for cigarettes, When you need a certain level raw materials which is inadequate or lack, at this time, you can find similar leaf from the same cluster by the similarity of tobacco quality, the EBSC algorithm is also helpful for tobacco alternative.The results show that EBSC spectral clustering in the division of the tobacco quality is feasible.At last, it gives the prospect of next work and views of the development of the spectral clustering.
Keywords/Search Tags:Data Mining, Spectral Clustering, Information Entropy, EBSC
PDF Full Text Request
Related items