Font Size: a A A

The Chinese Text Categorization Research Based On Support Vector Machine And Clustering Algorithm

Posted on:2010-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:H X WuFull Text:PDF
GTID:2178330332481973Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, more and more data appear in the form of text. How to quickly and efficiently access, manage and use these text data has become an urgent need. Over the past decade, as a solution to these problems, automatic text categorization technology has an unprecedented development, and has aroused general concern. Automatic Text Categorization is the process that text categories are classified automatically by computer. There are many features about text classification:wide spare of text vector problem with large sparse vector text, high dimension, between the characteristics of the larger relevance of the characteristics. So SVM is very suitable and potential for text classification. Meanwhile, it is full of challenging to resolve text classification using SVM. For example, there are too many samples and classification speed is slow by using SVM.Previous work has proved that SVM only depends on support vectors in the training, and has nothing to do with non-support vectors.So we reduce the vectors by K-means clustering algorithm to accelerate the training of SVM. We take full advantage of condition that the categories information is known of training samples, and use one-to-one clustering algorithm based on density to reduce non-support vectors in the pretreatment stage of training. It makes the final support vectors involved in the training samples greatly reduced. By this way, we generate a classifier that has the same accuracy with the traditional SVM, and the speed of the classifier has been greatly accelerated. In order to effectively cluster data, this paper improves K-means clustering algorithm:such as how to select the clustering center, standardize the clustering data and how to adjust the density radius of the clustering center.
Keywords/Search Tags:SVM, Chinese text classification, Clustering, Reducing
PDF Full Text Request
Related items