In today’s internet era of information explosion, with the increasing scale of thenumber of web texts, if people want to quickly and efficiently find the information theywant from these vast amounts of text information, it is not useful by the artificialclassification methods, while the text clustering technology just can provide a solutionby its excellent automatic text classification ability.Studies have shown that the traditional partitional algorithms such as the K-meansalgorithm, there are two shortcomings: Firstly, it does not well recognize non-sphericalor various sizes clusters; Secondly, it is sensitive to the pre-set K value, whichrepresents the number of clusters.For the first shortcoming, the paper uses a kernel method, also known as One-classsupport vector machine, which can establish a non-linear mapping between the inputspace and feature space through a kernel function and calculate a minimum radiussuper-sphere which contains all the mapping data points. The super spherical surface isthe cluster boundary and the data points on the boundary are called the support vectors,which represent a clustering cluster. One-class SVM is essentially a binary classifier,while in fact most of problems are multi-class classification problems. So this papertries to solve the multi-class classification problems by constructing K One-class SVMclassifiers, where K is the number of clusters.For the second question, the paper introduces a new evaluation function calledKCDBW, which expands the evaluation function CDBW by kernel method. KCDBWcan dynamically evaluate clustering result, save the best result and finally find theoptimal number of clusters.Based on the above two points, this paper proposes a new dynamic partitional textclustering algorithm based on the kernel method, that is multispheres text clusteringalgorithm based on One-class SVM. Its main idea is: firstly, reduce the dimension oftext feature space by using LSA method (Latent Semantic Analysis), and then useOne-class SVM to train each cluster in the clustering process, get the SVM model ofeach cluster, thirdly use clustering evaluation function KCDBW to guide the textclustering, and ultimately get the optimal number of clusters.In order to verify the effectiveness of the proposed dynamic partitioning textclustering algorithm based on the kernel method, this paper makes several experimentson the Chinese text clustering experiment platform. The experimental results show that the proposed new clustering evaluation function KCDBW has a good guidance in theclustering process and the proposed dynamic partitioning text clustering algorithm hasbetter clustering results than the traditional clustering algorithms. |