The Chinese Text Categorization Research Based On Support Vector Machine And Clustering Algorithm

Posted on:2010-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:H X Wu

Full Text:PDF

GTID:2178330332481973

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, more and more data appear in the form of text. How to quickly and efficiently access, manage and use these text data has become an urgent need. Over the past decade, as a solution to these problems, automatic text categorization technology has an unprecedented development, and has aroused general concern. Automatic Text Categorization is the process that text categories are classified automatically by computer. There are many features about text classification:wide spare of text vector problem with large sparse vector text, high dimension, between the characteristics of the larger relevance of the characteristics. So SVM is very suitable and potential for text classification. Meanwhile, it is full of challenging to resolve text classification using SVM. For example, there are too many samples and classification speed is slow by using SVM.Previous work has proved that SVM only depends on support vectors in the training, and has nothing to do with non-support vectors.So we reduce the vectors by K-means clustering algorithm to accelerate the training of SVM. We take full advantage of condition that the categories information is known of training samples, and use one-to-one clustering algorithm based on density to reduce non-support vectors in the pretreatment stage of training. It makes the final support vectors involved in the training samples greatly reduced. By this way, we generate a classifier that has the same accuracy with the traditional SVM, and the speed of the classifier has been greatly accelerated. In order to effectively cluster data, this paper improves K-means clustering algorithm:such as how to select the clustering center, standardize the clustering data and how to adjust the density radius of the clustering center.

Keywords/Search Tags:

SVM, Chinese text classification, Clustering, Reducing

PDF Full Text Request

Related items

1	Research And Realization Of Clustering Guided Web Chinese Text Classification Based On SVM
2	Chinese Text Classification Based On Active Learning
3	Design And Implementation Of Chinese WEB Documents Clustering And Classification System
4	Research On A Chinese Text Clustering Method
5	Chinese Text Classification Based On Svm Algorithm Realization
6	Study Of Chinese Text Classification
7	Research On Data Mining Technologies Applied To Web Chinese Text
8	Research And Improvement Of Automatic Classification Technology For Chinese Text
9	Research Of Text Clustering And Classification Method Based On Genetic Annealing Algorighms
10	Research On Several Models In Text Classification And Clustering