Font Size: a A A

A Semantic Enhancement Of Text Clustering Algorithm

Posted on:2012-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X C SongFull Text:PDF
GTID:2178330332975996Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
How to process ambiguous queries and organize query results for users to efficiently narrow his query boundary and quickly clearify his query intention is a huge challenge to modern information retrieve system. Unfortunately, most modern search engines are lack of explicit mechanism for processing ambiguous queries, user has to retrieve through huge amout of relevant pages to find his intended pages.Clustering query results and interacting with users in friendly designed UI is an efficient way to solve ambiguous queries.In this paper, I propose a semantic cluster algorithm enhancement making full use of the attractive outside knowledge source Web.2.0, and introduce the semantic information into cluster generation and cluster representation. In the process of text clustering, I use Wiki semantic taxonomy to enrich the representation of text content. In the process of cluster visualization, I take advantage of Delicious Folksomony for semantically reenforcing the original lable extract strategy.The system achitecture of this paper include:VSM building module, Wiki information extraction module, enhanced cluster module, folksomony lable selection module, and cluster lable confirmation module. The system is build on hadoop cluster. in the result analyse part, I compared the sementic enhanced K-mean with traditional K-means,fuzzy-K-means,LDA algorithm, analysed the advantage of my Wiki abstracting strategy from existed strategy, and proved the folksomony cluster lable selection is better than condroid, frequence-based, MI lable selection method.As the experiment result implies, the cluster quality is improved and cluster lable selection is more accurate, descriptive. What's more, the sementic enhancement strategy can be extended to other cluster algorithm and other aspect of data mining except text clustering.
Keywords/Search Tags:Information Retrieve, Cluster, Semantic, Web 2.0, Wikipedia, Tag Folksomony
PDF Full Text Request
Related items