Web Content Mining And Clustering Based On Intention | | Posted on:2011-04-14 | Degree:Master | Type:Thesis | | Country:China | Candidate:Q Zhang | Full Text:PDF | | GTID:2178360308961186 | Subject:Pattern Recognition and Intelligent Systems | | Abstract/Summary: | PDF Full Text Request | | Intention-based information retrieval is to extract the intention activity of internet users and intention tendency of the web pages. It is the hot topic in intelligent Information Retrieval, which has an important prospect for information development. In this paper, it mainly researches on the intention of information retrieval in the web content access and intent of web-based clustering. The main work of the paper is as follows.1. Metasearch engine foundationThe first work is to achieve the meta search engine, which crawls information results from multi-search engines and stores the web-information as structure documents. It makes fine settlements for deeper information results mining.2. REBVIPS based-on VIPS(Vision-based Page Segmentation)The REBVIPS based-on VIPS advances a new module using regular express to connect with html tags and vision information of the web pages. Meanwhile it achieves web-structure mining and eliminates the noise from web page relied on analyzing on html tags. The experiment shows that the REBVIPS possess well quality in web contents mining.3. Intention-based web clustering of similarity on TR moduleThe other main work on this paper is to sum up the level of classification of web page intentions and the evaluation module. It takes TR character to carry through the clustering analyzing based on web pages intents. The detail is as follows:(1) Our method takes k-means and k-center as the main clustering algorithms for the clustering module respectively. And the experiments compare the TR characters and common characters from the texts of web pages. The result shows that the clustering algorithm basing on TR characters perform better than the common characters in intention mining on web page. (2)At main while, the paper makes a comparison between the evaluation of clustering distance in different clustering methods. It analyses the influence of p-factor in the results of clustering... | | Keywords/Search Tags: | intention-analyze, REBVIPS, web mining, k-means, k-center, TR-character extraction, VSM | PDF Full Text Request | Related items |
| |
|