Font Size: a A A

Constructing The Corpus Of Geographical Entity Relations In Chinese Based On Baidu Baike

Posted on:2019-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:J B WangFull Text:PDF
GTID:2370330575950131Subject:Surveying and mapping engineering
Abstract/Summary:PDF Full Text Request
The corpus of geographical entity relations is the basic data resource of geographical information acquisition and geographical knowledge services,and its scale directly affects the training effect of machine learning models.Fast-updated web text is constantly emerging as a new relational example,requiring the corpus to be updated in a timely manner to cover richer relational instances.Manually constructing and updating corpus are expensive.Therefore,it needs a more efficient technology of corpus construction for massive geographical entity relations.In view of this,this paper researches on the method of constructing the corpus of geographical entity relations in Chinese,the main context as follows:(1)On the basis of summarizing related research,the classification system of geographic entity is established referring to the open classification system of the encyclopedia and the classification standard of basic geographic elements.And the classification system of geographical relationship is established by the classical semantic relation and the spatial relationship.Then,an annotation scheme of geographical relations is built,which considers both the linguistic habits of natural language and the annotation normalization.The scheme include a classification of geographic entities with 9 categories 94 small types,and a classification of geographical relations with 4 categories 105 small types.(2)We propose an efficient method of corpus construction for massive geographical entity relations through the automatic annotation technique.First of all,we combine the fully-matching with the approximate matching to improve the coverage rate of object entity finding.Secondly,we define the rules of sentence scoring by using the optimal sequence diagram method,as well as quantitatively evaluate the results of mapping the seed triples to the sentences Finally,a series of experiments based on the Chinese BaiduBaike are carried out,which is used to verify the effectiveness of the improved automatic annotation.The results show that,the average success rate of the automatic annotation is 67.83%?and the average accuracy of the annotated relations by our method is 76.36%.Comparing with the manually annotated corpus of the spatial relations.the proposed method constructed a large-scale corpus of geographical entity relations more efficiently,which provides a feasible scheme for expending geographical entity relations corpus automatic.At the same time,this method takes into account the semantic relationship and spatial relationship between geographical entities,and it can be used for open relation extraction task.Besides,the relation types are not limited,which can be applied to open relation extraction.(3)To test the availability of the corpus constructed in this paper.Aiming at the problems of irregular and diversified expressions of web text,this paper proposed a method of extracting entity relationship from web text using entity words vectors and sentence semantic vector,and the experiment of geographical entity relation extraction.Firstly,the sample sentences were segmented and the words were expressed as dense vector;Secondly,a layer of LSTM(Long Short Term Memory)network was used to produce sentences sematic vector according to the words' sequence vectors in the sentence;Eventually,sentence sematic vector and entity words vectors in sentence were entered into two fully connected network,then the relationship type of geo-entities was finally judged.Experimental results on self-built corpus shows that the accuracy of extracting geo-entity relationship types from web texts is 71%,and the accuracy of relative corpora is 88.8%.This research provides the method support for the construction of geographic knowledge graph,geographic information retrieval and geography ontology learning,verifying the availability of the corpus.
Keywords/Search Tags:geographical relations, corpus construction, automatic annotation, geographical information extraction, LSTM
PDF Full Text Request
Related items