Font Size: a A A

Oil Domain Ontology Construction Based On Document Semantic Recognition

Posted on:2019-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:P H ZhuFull Text:PDF
GTID:2381330626956575Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,the informatization degree of related activities in petroleum field is getting higher and higher,and the application of petroleum field information system to various knowledge bases is more and more frequent.There are many specialized professionals in the petroleum field.The new technologies and terminology are continuously updated and the information is not structured.These problems affect the knowledge representation,information sharing,software reuse and efficient management in the petroleum field.The most classical and most widely used method of knowledge representation is ontology by obtaining the corresponding text file from the existing information sources,and we build the ontology of related fields by handwork or semi-automatic way.At present,there are many problems in the petroleum field,such as independent development system,non-uniform data coding rules,repeated development of various system software and so on.In view of the above problems,this paper proposes a method of ontology construction of petroleum field for semantic recognition of documents,which is mainly divided into the following contents:Document word segmentation is the most important task to construct ontology of petroleum field.There are some characteristics of document terminology and combination words in petroleum field.Based on the hidden Markovian model,an adaptive Hidden Markovian character segmentation model is proposed in this paper,which combines the domain-knowledge dictionary and user-defined information,by introducing the terminology set.The proposed algorithm calibrates character segmentation under semantic constraints and character meaning constraints and could identify professional terms and character combinations in the field of petroleum accurately;We build domain corpus on different scale as information source to extract concepts.By analyzing the statistical method based on TF-IDF and the method based on petroleum dictionary,we design a combined method of both methods under to implement concept extraction under different number of documents.It is proved that the combining method is more accurate in concept extraction;Thirdly,the semantic relationship between the extracted concepts in the petroleum field is identified,the concept is expressed as a word vector according to the Continuous Bag-ofWords(CBOW)model.The word vector is extended and intensified using the improved vector training algorithm to make the word vector contain the context semantics information.The word vector is calculated and imported into Support Vector Machine(SVM)to train SVM classifier.Finally,the hyponymy,part-whole and synonymous relation will be identified.At last,the ontology is constructed automatically by the relation between concept and concept of extraction.The existing ontology learning tools are analyzed to construct the ontology learning system of this paper,and the automatic derivation of Chinese ontology is realized by using the probability ontology model and the data-driven method.This paper mainly uses OWL language,by importing the exported OWL file into the protégé platform,further feedback correction is made to finally realize the ontology's visual representation.
Keywords/Search Tags:Domain ontology, concept extraction, semantic relation identification, word vector
PDF Full Text Request
Related items