Font Size: a A A

The Research On An Improved Support Vector Clustering Algorithm And Its Application

Posted on:2011-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2189360305956061Subject:Knowledge management
Abstract/Summary:PDF Full Text Request
The 21st century is the era of knowledge economy, so the knowledge discovery has attracted increasing attention. The cluster analysis, which is one of the most important branches of knowledge discovery, can be used in almost every aspect of life. The kernel clustering algorithm, which has some advantages over the classical clustering algorithms, has become one of the hottest research points in clustering algorithms. The support vector clustering (SVC) algorithm we chiefly study in this paper is a typical kernel clustering algorithm.The SVC algorithm has two main advantages over other clustering algorithms for its abilities to generate cluster boundaries of arbitrary shape and to deal with outliers. This algorithm consists of two phases:SVC training and cluster assignment. The former requires calculating Lagrange multipliers and the latter requires calculating adjacency matrix, which may cause a high computational burden. To overcome these two difficulties, we present an improved SVC (ISVC) algorithm.In SVC training phase, we propose an entropy-based minimal enclosing sphere (MES) algorithm, which can evidently reduce the time of calculating Lagrange multipliers. In cluster assignment phase, we first make use of the kernel matrix to preliminarily classify the data points and calculate the center of each preliminary classification. Then calculate the adjacency matrix on the set of center points instead of calculating the one on the whole data set. This method effectively reduces the computing scale of adjacency matrix, consequently reduces the time of calculating adjacency matrix. As a result, the ISVC algorithm overcomes the two difficulties in original SVC algorithm. Numerical experiments show that, the ISVC algorithm outperforms the original SVC algorithm, not only for time complexity, but also for clustering precision.Then we apply the ISVC algorithm to text clustering. We choose one hundred texts in the small-scale corpus of Fudan University to experiment. First of all, we pretreat the experiment text set, procedures including:segment the text set; represent the text set in the form of matrix in terms of Vector Space Model (VSM); make use of the principal component analysis (PCA) technology to reduce the dimension of the text data. Then we use the ISVC algorithm to cluster the text data set and describe the clustering results. Finally, the ISVC algorithm is compared with the k-means and the aggregation algorithm in hierarchical clustering algorithm (DHCA) on the text data set. Experimental results show that the ISVC algorithm has the higher precision. In a word, the ISVC algorithm has improved the efficiency of text clustering to some extent.
Keywords/Search Tags:Support Vector Clustering, Minimal Enclosing Sphere, Adjacent Matrix, Text Clustering, Principal Component Analysis
PDF Full Text Request
Related items