Agent-based Web Information Extraction

Posted on:2005-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:H Di

Full Text:PDF

GTID:2208360122497456

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of World Wide Web, Web information extraction (WeblE) has been becoming a focus research topic among academic and commercial fields in recent ten years. The goal of WeblE is to locates identify the interested information from heterogeneous Web sites, and to organize the extracted information in a homogeneous and structured format. The major difficulty of WeblE lies in the complexitys adaptability and scalability which are caused by the inherent features of Web site including huge number, various format and frequent updating.This paper presents an-Agent-based WeblE system that is a typical multi-Agent system (MAS). The system is mainly composed of three Agents andfour kinds of knowledge bases. Knowledge base is the foundation of agents' activities. In this paper, XML is employed to describe knowledge and the communication between Agents. Each Agent of the system has its own objective so that it can act autonomously while can coordinate and cooperate with other agent and the user. The infrastructure simplifies the original complicated WeblE problem.Information Extraction Agent is the core of the three agents. It undertakes the responsibilities of learning extraction rules and extracting information with the relevant rules and Web page. Here the Wrapper induction and DOM tree methods are utilized which had been used widely in previous researches.Because of combining the domain-specific semantic feature of the interested information and the Web page format feature in the definition of extraction rules, the proposed system obtains better reusability and adaptability. Moreover, The ability that agent can perceive how the Web site update and further more adapt the rules initiatively also contributes to enhancing the adaptability on some extent. In addition, the semi-automatic learning method in which the user participates simplifies the learning process.

Keywords/Search Tags:

Web information extract ion (WeblE), Agent, XML, learn, extraction rules

PDF Full Text Request

Related items

1	Research On Web Information Extraction For Domain In Information Integration System
2	Research On The Application Of Extract Technology Of Web Teaching Resource
3	Building intelligent agents that learn to retrieve and extract information
4	Research On Language And Key Techniques For Accurate Information Extractionrules Towards Complex Web
5	Heuristic rules for extraction of ontology from Web pages in WebOntEx
6	Design And Implementation Of Web Information Extraction Rules
7	Optimizing Of Extraction Rules And Expressing Of The Rules With XQuery In Web Information Extraction Systems
8	The Information Extraction Of Unstructured Document Extraction And Analysis
9	Financial Tranxaction Information Extraction System Based On Rules And Statistical Models
10	Semi-supervised Blog Information Extraction Techniques Based On Document Structure