Font Size: a A A

Extraction From The Web Table Of Contents Based On Ontology And Implementation

Posted on:2007-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L LinFull Text:PDF
GTID:2208360185456298Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, Web becomes the main information resource. However it is not easy for people to get the information they really interested in on the Web, since Web pages are semi-structure or non-structure, the hyperlinks are disordered and the data are massive and dynamic. The appearance of Web information extraction provides a good solution of this problem, which could help people get the knowledge more quickly and more exactly.As a compact and efficient way to present relational information, tables are used frequently in web documents. According to the report, about 52% of HTML documents include tables. Although some of these tables are only for physical layout, there is still a significant amount of online data that is stored in HTML tables.Since tables are inherently concise as well as information rich, the automatic understanding of tables has many applications including knowledge management, information retrieval, web mining, summarization, and content delivery to mobile devices. The ubiquity of tables and their ability to describe relational information in a compact and immediate manner make them attractive targets for the research on Web information extraction. In the long term, the essential method to solve the above contradiction is to change disorderly data into orderly knowledge, to make computers understand the Web information and the need of people. Tim Berners-Lee proposed the concept of Semantic Web in 1998. It is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.The Semantic Web uses a multi-level framework to achieve its goal. Ontology locates in the level from textual description to knowledge-based reasoning. So it is important to develop ontology for the Semantic Web.Ontology is an explicit specification of a conceptualization. It defines the basic concepts and relations comprising the vocabulary of a topic area. This makes these concepts and relations have explicit and exclusive definitions in certain scope. Then people can communicate with machines freely.
Keywords/Search Tags:Web table, ontology, Content extraction, Jena, DOM4J
PDF Full Text Request
Related items