| With the rapid development of World Wide Web, Web information extraction (WeblE) has been becoming a focus research topic among academic and commercial fields in recent ten years. The goal of WeblE is to locates identify the interested information from heterogeneous Web sites, and to organize the extracted information in a homogeneous and structured format. The major difficulty of WeblE lies in the complexitys adaptability and scalability which are caused by the inherent features of Web site including huge number, various format and frequent updating.This paper presents an-Agent-based WeblE system that is a typical multi-Agent system (MAS). The system is mainly composed of three Agents andfour kinds of knowledge bases. Knowledge base is the foundation of agents' activities. In this paper, XML is employed to describe knowledge and the communication between Agents. Each Agent of the system has its own objective so that it can act autonomously while can coordinate and cooperate with other agent and the user. The infrastructure simplifies the original complicated WeblE problem.Information Extraction Agent is the core of the three agents. It undertakes the responsibilities of learning extraction rules and extracting information with the relevant rules and Web page. Here the Wrapper induction and DOM tree methods are utilized which had been used widely in previous researches.Because of combining the domain-specific semantic feature of the interested information and the Web page format feature in the definition of extraction rules, the proposed system obtains better reusability and adaptability. Moreover, The ability that agent can perceive how the Web site update and further more adapt the rules initiatively also contributes to enhancing the adaptability on some extent. In addition, the semi-automatic learning method in which the user participates simplifies the learning process. |