Font Size: a A A

Research Of Chinese Web Text Clustering Technology

Posted on:2008-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J MaoFull Text:PDF
GTID:2178360242478488Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Text Clustering is an important technique. It's a kind of unsupervised learning which can automatic work in computer without manual intervene. Through comparing similarities and differences of texts, it can find out features and rules of texts to know the text data.In this thesis, text clustering is working on Chinese Web text. The whole process of Chinese Web text clustering have been researched including text pre-treatment, text clustering, capability evaluation.In research of text pre-treatment, a method of features selection based on GA has been proposed. It can reduce the dimensionality of text feature vector for text clustering. Then complication of clustering operation will be reduced, but quality of clustering still ensured.In research of text clustering, an improved K-means method with isolated points detection has been proposed. In the method, isolated points will be detected then picked out from data set before clustering operation. So that disturbance of isolated points will be avoided. After clustering operation, isolated points will rejoin into clusters to ensure the integrity of the clustering result.Finally, a model of Chinese Web text clustering has been built in this thesis. And every module of it has been introduced.
Keywords/Search Tags:Text Mining, Clustering, Feature Selection, GA, Isolated Point
PDF Full Text Request
Related items