Font Size: a A A

Research On Entity Relation Extraction Technology Of Geographic Text Big Data Based On Distant Supervision

Posted on:2022-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:H T GengFull Text:PDF
GTID:2480306524989189Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Big data contains huge amounts of information.Geographic text big data is an important part of big data,and it is also an important data for the construction of geographic knowledge graph,but it contains a lot of useless information,and How to get valuable information from the big data is an urgent problem for people to solve.Geographic knowledge graph can help scholars solve this problem,and entity relationship extraction is one of the important tasks in the construction of knowledge graph.Entity relationship extraction is mostly based on deep learning.However,there are few labeled corpora in the geographic text,and the cost of manual labeled data is high,these reasons make it difficult to use supervised method for extraction,and the effect of unsupervised method is poor.Therefore,the entity relationship extraction based on distant supervised learning is mostly used.This method can obtain a large amount of labeled data by aligning a small-scale knowledge base with a large-scale corpus,then denoise the data,and finally the entity relationship is extracted.Distant supervised method can not only reduce the cost of labeled corpus,but also achieve a better effect than unsupervised method.However,there is no high-quality geographic corpus at present.Firstly,this paper constructs a geographic text labeled corpus,then the entity relations are extracted by the improved model in this paper,and finally the improved model in this paper is applied.The main research include the following parts:(1)Aiming at the problem of lack of professional geographic labelled corpus at present.Firstly,the structured knowledge of Baidu baike is used to construct the knowledge base of the geographic information,and the unstructured text big data of Baidu baike and news pages are used to construct the corpus,and the labeled corpus of the geographic text is obtained by aligning the knowledge base and corpus.Then,this paper proposes a corpus labeling algorithm based on feature evaluation and keyword similarity analysis under the premise that the current corpus labeling algorithm is not very effective.Firstly,the weights of the parts of speech,relative position and distance of the keywords is calculated by the algorithm,and combines these weights to get the weight of the keywords.The labeled corpus is obtained by combining the similarity analysis of the keywords.Finally,experimental restlts on the geographic corpus constructed in this paper show that the precision and recall rate of the algorithm in this paper are improved compared with other algorithms.(2)Aiming at the problem of most distant supervised relation extraction models are designed to reduce the noise of packets and sentences,ignoring the impact of noise labels on the model performance at present.This paper focuses on noise reduction of labels.The positive effects of entity type information and sentence grammar information on relation extraction are often ignored.On the basis of Bi LSTM model,firstly the attention mechanism layer of words around entities is added into Bi LSTM as the the first module of sentence encoding.Secondly the entity type embedding module is added to the model to enrich the sentence encoding information.Finally the semantic dependency parsing is added to the model as the third module.Three modules together constitute a relation extractor..And label learner is designed to learn soft labels in combination with reinforcement learning to correct moisy labels.Label learner and relation extractor constitute deep reinforcement learning model.Experiments on public dataset ACE2005,Chinese-Literature-NER-RE-Dataset and the dataset constructed in this paper show that the proposed distant supervised entity relationship extraction model is better than several mainstream models in precision and recall rate.(3)An automatic question-answering system in the domain of geological disaster is designed and implemented.An intelligent question-answering system in the domain of geological disaster is designed and implemented.The data of the system is composed of the news and other web page information collected in Chapter 3.The algorithm is composed of the improved distant supervised relationship extraction model in Chapter 4 and the question-and-answer pair matching.The system combines the matching of question-answering library with the improved model,which can alleviate the scarcity of question-answering resources in the domain of geological disaster to a certain extent.
Keywords/Search Tags:geographic text big data, distant supervision, entity relation extraction, deep learning, reinforcement learning
PDF Full Text Request
Related items