Font Size: a A A

The Research And Design Of The Data Capture Of Subject GateWay Of Integrated Risk

Posted on:2008-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhuFull Text:PDF
GTID:2178360215464844Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Construction of the subject gateway is the important constitute of integrated risk prevention, and the research for capturing data is more important.According to the origin of subject gateway information, capturing data by means of special extractor and deep extractor, Then establish the knowledge repository for processing data, which has three steps as following:1) Special extractor for ordinary web information: Define template, special capture list page from the seed sites; Then by means of DOM and heuristic rulers, position list block; Propose clustering based Label Distance method, improve the clustering; realize the wrapper of the structured list data; Propose Containner Distance, improve the Finn's body text extraction method, realize extracting the dataset.2) For deep web data, propose deep extractor:·Understanding form: form is the only interface for accessing deep web, First, establish the schema of the form, construct searching form expression, analyze the semantics between form elements, construct heuristic rulers for extracting the logic attribute from the form.·Submit form: based on the schema of the form, improve submit strategy, propose random excluding strategy for submit form automatically.·The process of the response page: construct heuristic rulers, extracting the dataset.3) By means of metadata, construct the model of integrated risk data, realize unifying the data format extracted;establish the integrated risk knowledge repository for classifying ,processing the data capatured.
Keywords/Search Tags:Integrated Risk, Special Capture, Form Understanding, Knowledge Repository
PDF Full Text Request
Related items