Font Size: a A A

Research On Web Information Extraction Applied To Chinese Name Search Engine

Posted on:2007-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360182993958Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web information extraction is the process of extracting information needed from Web documents. This paper researched information extraction and applied to subject-oriented search engine. The subject of this paper is Chinese name.The paper researched Web information extraction technology to Web information of Chinese name. The paper designed the information extraction model and tested it. The paper extracted people' s attributes (birthday, occupation, place and organization) from the Web documents.The paper explained system flow, the methods of submodule in the system flow and concrete technology used in the module of information extraction in detail. The paper used different pattern extraction algorithm to different class of Web documents. The paper used knowledge engineering approach to the class of person introduction and built pattern repository manually. The paper used automatic training approach to the class of person action. A new algorithm was proposed to extract pattern from training set automatically. At last the paper experimented on Web pages about somebody, and the experimental results proved that the information extraction model can extract right information relatively and satisfied the requirements.
Keywords/Search Tags:information extraction, search engine, pattern match
PDF Full Text Request
Related items