Font Size: a A A

Research On Semi-automatic Tagging Of Geographical Entities Information Based On Incremental Learning

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2370330647958375Subject:Surveying and mapping engineering
Abstract/Summary:PDF Full Text Request
Text as the main data source of geographic information,contains rich and diverse geographic information,which is of great value to the mining and utilization of geographic resources.At present,the research of geographic information gradually changes from "spatial location" to "geographic entity".The geographic entity information in the text is the data that describes the characteristics and properties of the geographic entity,including the name,time,space,relationship,attribute and other information of the geographic entity.The premise and the first step of mining geographic entity information from text is to annotate geographic entity information,mainly through manual or machine annotation.However,there are two notable problems in labeling geographic entity information by human or machine.Due to the cognitive differences of the labelers and the lack of a unified labeling system for geographic entity information,there are significant errors and omissions in labeled data.The manual labeling process requires a large amount of time,which leads to obvious slow progress in labeling.Therefore,how to annotate the geographic entity information in the text with high quality and high efficiency has become the basic problem to be solved urgently.Based on the analysis of the characteristics of geographical entity information description in Chinese texts,this paper improves the annotation system of geographical entity information and constructs an incremental learning model of geographical entity information extraction.Aiming at the low efficiency of geographic entity labeling,the semi-automatic labeling method of geographic entity information is studied.By introducing the idea of iteration,an iterative algorithm for geographic entity information extraction model is constructed.At the same time,a semi-automatic annotation system of Chinese text geographic entity information is developed to improve the corpus annotation quality and efficiency.The main research contents and achievements include the following aspects:(1)Incremental learning method of geographic entity information extraction modelAiming at the problems of labeling errors and omissions in labeling data,this paper analyzes the description characteristics of geographic entity information in text,and improves the labeling system.Considering the performance requirement of combination of rule,machine learning,deep learning and incremental learning,geographic entity name,time information and space information extraction is realized by rule and machine learning model.Based on this,an online incremental learning model based on rules and an offline incremental learning model based on conditional random field model are constructed to improve the unregistered words extraction performance.At the same time,the problem of labeling errors and omissions is solved with the restriction of model assistant annotation.(2)Model iteration algorithm for semi-automatic annotation of geographic entity informationDue to the strong dependence of geographical entity information extraction model on benchmark test set,manual assistance takes a great deal of time and leads to slow annotation process.Based on the model of geographic entity information extraction,an iterative algorithm for geographic entity information is constructed by combining the semi-automatic annotation process and the iterative idea.For the iterative algorithm,the optimal iterative period and iterative scale are obtained by experiments.The corpus annotated by People's Daily is used to test the indexes of geographic entity information extraction model after implementing iterative algorithm.Experimental results show that in the iterative algorithm,quality control is used as a constraint to reduce the time needed for manual labeling.(3)Prototype system development and experimental evaluationResearch and develop the semi-automatic annotation system for geographic entity information to realize data upload,geographic entity information extraction,incremental learning,semi-automatic annotation of geographic entity information,data query and download and other functions.Compared with manual labeling and automatic labeling,the use of semi-automatic labeling greatly improves the labeling efficiency and the quality of labeling data.The results show that it is feasible to annotate geographic entity information in text with high quality and high efficiency by semi-automatic annotation based on incremental learning.Among them,benchmark set plays an important role in the training process of machine learning,and it is universal and transplantable to the recognition model of geographical entity name,time information and space information.However,there are great differences between attribute information and relation information of geographical entities,so it is necessary to construct corresponding rules and extraction models for specific categories.However,when semi-automatically labeling geographic entities from other sources,it is necessary to adjust parameters such as feature templates,syntax rules,iteration period,iteration scale and other parameters.
Keywords/Search Tags:Chinese text, geographic entity information, incremental learning, semi-automatic tagging, tagging corpus
PDF Full Text Request
Related items