Font Size: a A A

Learning Domain Ontologies From Chinese Text Corpora

Posted on:2011-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YuFull Text:PDF
GTID:1117360305955734Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
A domain ontology is a formal and explicit specification of a shared conceptualization of a specific domain (or an application). There is an increasing need for domain ontologies in various areas, such as knowledge management, semantic service, e-commerce, artificial intelligence, etc. However, building a domain ontology is a labor-intensive and time-consuming task. In recent years, ontology learning approaches, which use machine learning technologies to (semi-)automatically extract relevant concepts and relations from data sources to form an ontology, have been proposed to support the ontology building process.In the (semi-)automatic building process of a domain ontology, there existes 3 main problems, including a) extracting terms from electronic documents, b) building the domain ontology concept set, and c) buiding the ontology relation set. Accordingly, the dissertation proposes 3 ontology learning methods and approaches to offer computational support for semi-automatically building Chinese domain ontologies.1) It proposes a new Chinese term extraction method, the by-step-of-atomic-word method. This method combines POS analysis and string frequency statistics to determine whether a Chinese string in a document is a term or not and collect all terms occurring in the document. It is a practical solution to the problem of extracting Chinese terms from electronic documents.2) It proposes a new approach of learning the domain ontology concept set from Chinese text corpora. The approach is composed of two methods:the DMD (domain membership degree) anaylsis method for extracting domain-specific terms and a new synonym merging method for eliminating the synonymous terms. Given proper text corpora, it solves the problem of building the domain ontology concept set.3) It proposes a new ontology relation learning method, the concept-feature-based method. The method suggests ontology relations, especially the non-taxonomic ones, by analyzing the relevance of features of two concepts. Through combining the established string-inclusion method for learning taxonomic relations, it efficiently supports the building process of the ontology relation set.Each of the proposed methods and approaches have been tested on various types of Chinese text corpora and modified for tens of times. They offer much better performance when learning Chinese domain ontologies than currently existing approaches typically do. They have been used and verified in the project of terminology standardization of the information & knowledge management field, which is sponsored by CNCTST and NSFC.The methods and approaches proposed in the dissertation are practical and efficient. Given proper text corpora, one can build his/her Chinese domain ontology using the approaches and then build ontology-based applications. The approaches support the semi-automatically building of Chinese domain ontologies, simplify the building process, in that way, promote the industrialization of ontologies. If changed slightly, the methods and approaches may also be used in many other areas, such as semantic retrieval, text summarization, etc.
Keywords/Search Tags:Domain Ontology, Ontology Learning, Ontology Building, Term Extraction, Domain-specific Concept, Ontology Relation
PDF Full Text Request
Related items