A Document Interoperation Framework on the Semantic Web (DIFSEW)

Posted on:2012-01-02

Degree:Ph.D

Type:Dissertation

University:University of New Brunswick (Canada)

Candidate:Ranganathan, Girish R

Full Text:PDF

GTID:1468390011960567

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Enormous amounts of electronic documents are generated in various domains within various contexts. Although these documents are interpretable by human readers, almost all of them lack explicit semantics which allow software applications to correctly interpret data in the documents. Therefore, it is important to create methods which allow automatically extracting information from and imparting semantics into the electronic documents. The semantics enables meaningful search, querying, transformations and interoperation of information within documents. This is especially important for large information archives which are usually the main parts of large enterprise information systems. The semantic enrichment of these information archives discussed in the present dissertation adopts Semantic Web techniques, such as ontologies, rules, and their reasoning engines, as well as Information Extraction methods, which involve position-based and ontology-based techniques. This allows re-engineering large enterprise information systems into knowledge-based systems where data from documents is automatically processed in a meaningful way.;Although the Semantic Web and Information Extraction fields are relatively well developed now, there is a need to develop an integrated framework that can embody appropriate methods to process large document storages. The goals of the present dissertation are: (1) to create the integrated semantic framework for document processing, (2) research and develop components of the framework, and (3) investigate existing and possible applications of the framework. The main objectives of the research behind the dissertation are to investigate: (1) methods which allow automatically extracting information from and imparting semantics into the electronic documents, (2) methods for preprocessing information before performing information extraction, (3) methods to process business rules with semantics for externalization of processing logic, (4) methods to work with multiple domains seamlessly, and (5) an integrated framework that can embody appropriate methods to process large document storages.;The semantic framework which integrates domain ontologies, rules, reasoning engine, Information Extraction methods, and application logic for building knowledge-based software systems is the main research outcome presented in the dissertation. The purpose of the domain ontologies is to specify conceptualization of the domain the documents belong to. The ontologies can be built manually, extracted from documents, or re-used. The purpose of rules is to specify business logic used in an enterprise. The business logic can be represented by decision table, production rules, or First-Order Logic. The purpose of Information Extraction methods integrated into the framework is to extract semantics from documents presented in various formats. The reasoning engine can be any existing engine, which can process ontologies and rules represented in an appropriate format. The application logic is responsible for querying the reasoning engine and present the result to the user.;Although the framework is integrated, its parts are externalized and independent; so the information extraction from documents, domain ontologies, document processing (business) logic, and semantic reasoning can be created and maintained separately by appropriate specialists in the field. The framework includes semantic processing of externalized data processing logic rules and to some extent externalization of application logic. The creation of external information extraction rules by the knowledge engineer is a cumbersome and time consuming task. To overcome this problem, the framework also includes a rule learning or induction system to semi-automate the generation of information extraction rules from source documents with the help of manual annotations. The present ontology and rule-based framework can be applied to: (1) re-engineering very large enterprise information systems adapting Semantic Web computing techniques and (2) creation of new knowledge-based software systems.;The dissertation is article based. It presents a variety of concepts published as individual articles to solve the problems stated above and more. Some of the concepts addressed by the dissertation are: (a) A framework for knowledge-based systems which address the concerns relevant to the problems discussed; (b) Information pre-processing using meta-ontology before performing information extraction to populate the domain ontology; (c) Identification and resolution of conflicts during ontological integration using rules for working with information from different domains; (d) RuleML-based learning object interoperability on the semantic web for representing ontologies using RuleML; (e) Representing user-friendly business rules in a semantic web-based format; (f) Information extraction from syllabi for academic e-advising; (g) Semantic annotation of semi-structured documents.;The dissertation uses all the concepts listed above and explains them as a framework consisting of modular features. More detailed information for each of the listed concepts can be found in the respective articles presented in the chapters.

Keywords/Search Tags:

Information, Framework, Semantic, Document, Rules, Domain, Logic, Concepts

PDF Full Text Request

Related items

1	Research Of Domain Ontology Concept Extraction Based On Association Rules
2	A probablistic framework for mapping audio-visual features to high-level semantics in terms of concepts and context
3	Mining semantic relationships between concepts across documents using Wikipedia knowledge
4	Research On The Approaches To Combining Ontologies And Rules In The Semantic Web
5	New semantic similarity techniques of concepts applied in the biomedical domain and WordNet
6	Research On Domain Resource Clustering Based On Semantic Field Model And Its Application
7	An approach to formalizing ontology driven semantic integration: Concepts, dimensions and framework
8	Automatic Construction Method For Domain Concepts Based On Wikipedia Semantic Knowledge Base
9	Semantic components: A model for enhancing retrieval of domain-specific information
10	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications