Font Size: a A A

The Research And Implement Of Incremental Chinese Text Automatic Categorization

Posted on:2005-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J GaoFull Text:PDF
GTID:2120360125961652Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Text automatic categorization, the process of assigning one or multiple predefined category labels to free text documents, provides more effective search strategies and more exact query results for information retrieval. With the rapid growth of the information resources on Internet, it has become more and more important for text automatic categorization to search information on Internet.The thesis summarizes systematically techniques of Chinese text automatic categorization. Vector Space Model (VSM) which is used to represent text and feature acquiring methods are introduced. Categorization algorithms based on SVM and Bayes method are deeply investigated. A new incremental learning with SVM method is proposed to boost training speed, decrease storage space and use the history information fully. Incremental learning is an effective method for learning the classification knowledge from massive data, especially in the situation of high cost in getting labeled training examples, so an incremental Bayesian learning model is presented. An experimental system of Chinese text automatic categorization is built up to verify the validity of categorization algorithms those proposed above.Document frequency, information gain, mutual information and CHI statistic are analyzed in detail and compared through experiments. A combined feature selection method is proposed. The experimental results show that combined feature selection method can improve the classification precision.
Keywords/Search Tags:text categorization, Chinese text categorization, Bayes, SVM, incremental learning
PDF Full Text Request
Related items