A Document Interoperation Framework on the Semantic Web (DIFSEW) | | Posted on:2012-01-02 | Degree:Ph.D | Type:Dissertation | | University:University of New Brunswick (Canada) | Candidate:Ranganathan, Girish R | Full Text:PDF | | GTID:1468390011960567 | Subject:Computer Science | | Abstract/Summary: | PDF Full Text Request | | Enormous amounts of electronic documents are generated in various domains within various contexts. Although these documents are interpretable by human readers, almost all of them lack explicit semantics which allow software applications to correctly interpret data in the documents. Therefore, it is important to create methods which allow automatically extracting information from and imparting semantics into the electronic documents. The semantics enables meaningful search, querying, transformations and interoperation of information within documents. This is especially important for large information archives which are usually the main parts of large enterprise information systems. The semantic enrichment of these information archives discussed in the present dissertation adopts Semantic Web techniques, such as ontologies, rules, and their reasoning engines, as well as Information Extraction methods, which involve position-based and ontology-based techniques. This allows re-engineering large enterprise information systems into knowledge-based systems where data from documents is automatically processed in a meaningful way.;Although the Semantic Web and Information Extraction fields are relatively well developed now, there is a need to develop an integrated framework that can embody appropriate methods to process large document storages. The goals of the present dissertation are: (1) to create the integrated semantic framework for document processing, (2) research and develop components of the framework, and (3) investigate existing and possible applications of the framework. The main objectives of the research behind the dissertation are to investigate: (1) methods which allow automatically extracting information from and imparting semantics into the electronic documents, (2) methods for preprocessing information before performing information extraction, (3) methods to process business rules with semantics for externalization of processing logic, (4) methods to work with multiple domains seamlessly, and (5) an integrated framework that can embody appropriate methods to process large document storages.;The semantic framework which integrates domain ontologies, rules, reasoning engine, Information Extraction methods, and application logic for building knowledge-based software systems is the main research outcome presented in the dissertation. The purpose of the domain ontologies is to specify conceptualization of the domain the documents belong to. The ontologies can be built manually, extracted from documents, or re-used. The purpose of rules is to specify business logic used in an enterprise. The business logic can be represented by decision table, production rules, or First-Order Logic. The purpose of Information Extraction methods integrated into the framework is to extract semantics from documents presented in various formats. The reasoning engine can be any existing engine, which can process ontologies and rules represented in an appropriate format. The application logic is responsible for querying the reasoning engine and present the result to the user.;Although the framework is integrated, its parts are externalized and independent; so the information extraction from documents, domain ontologies, document processing (business) logic, and semantic reasoning can be created and maintained separately by appropriate specialists in the field. The framework includes semantic processing of externalized data processing logic rules and to some extent externalization of application logic. The creation of external information extraction rules by the knowledge engineer is a cumbersome and time consuming task. To overcome this problem, the framework also includes a rule learning or induction system to semi-automate the generation of information extraction rules from source documents with the help of manual annotations. The present ontology and rule-based framework can be applied to: (1) re-engineering very large enterprise information systems adapting Semantic Web computing techniques and (2) creation of new knowledge-based software systems.;The dissertation is article based. It presents a variety of concepts published as individual articles to solve the problems stated above and more. Some of the concepts addressed by the dissertation are: (a) A framework for knowledge-based systems which address the concerns relevant to the problems discussed; (b) Information pre-processing using meta-ontology before performing information extraction to populate the domain ontology; (c) Identification and resolution of conflicts during ontological integration using rules for working with information from different domains; (d) RuleML-based learning object interoperability on the semantic web for representing ontologies using RuleML; (e) Representing user-friendly business rules in a semantic web-based format; (f) Information extraction from syllabi for academic e-advising; (g) Semantic annotation of semi-structured documents.;The dissertation uses all the concepts listed above and explains them as a framework consisting of modular features. More detailed information for each of the listed concepts can be found in the respective articles presented in the chapters. | | Keywords/Search Tags: | Information, Framework, Semantic, Document, Rules, Domain, Logic, Concepts | PDF Full Text Request | Related items |
| |
|