Font Size: a A A

Research And Implementation Of Chinese Text Clustering Algorithms

Posted on:2011-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q G WuFull Text:PDF
GTID:2178330332488410Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text clustering is an important means and a vital branch of data mining. Since most current clustering algorithms are based on the vector space model, the performance and accuracy of clustering are not high, so the algorithm of Chinese text clustering is becoming one of the most important research contents in Chinese information processing.In this paper, a research on text clustering based on vector space model is firstly presented. The critical technique of FCM is analyzed and the problems of building text vector space, forming clustering description and finding the numbers of clusters automatically are solved. Moreover, based on the semantic similarity calculations between documents, two clustering methods are put forward:the iterative semantic clustering (ISC) algorithm and the weighted subject concept graph (WSCG) clustering algorithm. Finally, a system named RCCluster is implemented in C++, providing the text clustering methods combined with both the vector space model and the semantic space model.Experimental results show that the methods integrated in RCCluster are more comprehensive. In addition, they have higher clustering accuracy and better clustering performance.
Keywords/Search Tags:Text Clustering, Word Similarity Computing, Vector Space Model, FCM
PDF Full Text Request
Related items