Study On Key Techniques Of Web Mining For Intelligent Information Retrieval

Posted on:2007-11-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Yuan

Full Text:PDF

GTID:1118360185477713

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Since WWW came into the world in 1991, it has been developed quickly and is becoming an important information source of human society. With the rapid development and perfection of Internet techniques, WWW will serve as an important medium from which people obtain information. In the past years, it is convenient for people to search for the useful information, but with the huge increment of the amount of information in the Internet, people feel it is more and more difficult to search what he needs. The reason is that the traditional information retrieval technology has not adapted well to the massive information any longer. Thus it is urgent to expect the appearance of a more intellectualized information retrieval technology for the massive information retrieval in Internet.This dissertation researches some key techniques on Web mining for intelligent information retrieval. It mainly focuses on data preprocessing, classification/clustering of Web pages or Web users, conceptual retrieval and personalized services. We propose or improve some Web mining algorithms for intelligent information retrieval. And we also develop an intelligent information retrieval prototype system.Data preprocessing includes information extraction from PDF documents, Chinese word segmentation and Web log preprocessing. For information extraction from PDF documents, we propose a rule extraction algorithm based on format infusion, and an information extraction algorithm based on tree model; For Chinese word segmentation, a method based on gradual enriching dictionary was proposed. Comparing with the single dictionary matching or statistic method respectively, this new method obtains much better result; For Web log preprocessing, the path complement is mainly discussed and a new algorithm is given in this dissertation.In the researches on Web pages' classification, this dissertation discusses various methods of text classification and mainly discuss the k-nearest neighbor (k-NN) that has higher classification accuracy of text classification. To improve the efficiency of k-NN, we propose a training samples reduction method based on the density of class and a gradual classification pattern. By computing each density of class in training set and the average density of the whole training set, some samples in the high-density class can be deleted using the training samples reduction method. The gradual classification pattern reduced the proportion of analyzing the whole document by simulating manual classification intelligently.

Keywords/Search Tags:

Intelligent Information Retrieval, Data Mining, Web Mining, Personalized services, Data Preprocessing, Information Extraction, Clustering analyzing, Classification Rule, Web User, Web Page, Ontology, Conceptual Retrieval

PDF Full Text Request

Related items

1	Study Of Fields-oriented High Quality Information Retrieval Based On Web Data Mining
2	Data Mining Research In Web Information Retrieval And Classification
3	A Study On The Application Of The Techniques Of Data Mining In Personalized Information Retrieval System
4	Research On Personalized Technique Of Data Mining In Information Retrieval
5	Study On Key Techniques Of Web Mining For Intelligent Information Retrieval
6	Research On The Key Techniques Of Web Information Intelligent Acquisition
7	The Research Of Web Personalized Information Recommendation Based On Data Mining
8	Based On Intelligent Agent Web Personalized Information Retrieval System
9	Related Studied On Information Extraction And Information Recommendation Based On Web Data Mining
10	Research On Chinese Personalized Retrieval System Based On User Model