Research On Location Information Extraction For Web Text

Posted on:2021-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:T Sun

Full Text:PDF

GTID:2480306293952479

Subject:Cartography and Geographic Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of the mobile Internet,the Internet has become an important channel for generating geographic information.According to statistics,nearly 70% of the Internet data is related to geographic information.The amount of location data generated by webpage text and social networks is close to the amount of data collected by specialized devices.Extracting location information from text quickly and accurately can greatly improve the efficiency of data collection and better meet people’s needs for geographic information.The location information in the text includes two parts: the geographical named entity and the relative location information.The geographical named entity is the place name and the name of some organizations in the text.The relative location information is attached to the entity and used to describe the spatial relationship between entities.The existing research only focuses on the extraction method of geographical named entities,ignores the recognition and transformation of the relative position relationship between entities,and lacks the relevant corpus of full-position information.At the same time,the existing identification methods still have many shortcomings,such as the complex geographical names recognition recall rate is not high,the identification range is not accurate and so on.Therefore,it is of great significance in both theory and practice to study the problems existing in automatic extraction of spatial location information in network text.On the basis of the existing research at home and abroad,this paper establishes a corpus of full-position information,in which the labeling of relative position relation is added.Based on the expanded corpus,the method of extracting and visualizing the position information in text is designed.The main contents are as follows:(1)Building a location information annotation corpus based on network text and design a relevant annotation system.In this paper,a large number of corpora are extracted from relevant websites and processed with text extraction,pretreatment,cleaning,word segmentation and part of speech tagging.IBO labeling system is adopted to design related labels to mark the corpus into characters and form a corpus.This corpus solves the problems of insufficient data of open corpus,poor timeliness of corpus and lack of annotation of relative position information.(2)BERT pre-training model is introduced to design a recognition method based on the bert-bilstm-crf composite model.BERT model has a strong ability to express text features,Bi LSTM model can extract context features well,and CRF model imposes constraints on label distribution.In this paper,the validity of this method for the recognition of geographical named entities and relative location information is verified by designing comparative test and combining with relevant evaluation indexes.(3)Transforming the position information in the text into structured information.This paper summarizes the relative position information and summarizes the common four kinds of relationship semantics and three kinds of distribution structures among geographically named entities.Based on baidu map platform,the paper designs the method of reasoning and transformation of position information in text.Finally,a demonstration application of extracting the path from the text was developed according to the actual needs of the current new epidemic outbreak,and the transformation method was verified.

Keywords/Search Tags:

BERT model, web text, location information, geographic named entity recognition

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Normalization For Biomedical Text
2	Research On The Application Of Deep Learning Models In Geographic Named Entity Recognition
3	Research On Biomedical Named Entity Recognition Algorithm Based On Multi-Task Learning
4	Research On The Identification And Standardization Of Medical Named Entities From Clinical Real-World Data
5	Research On Virus Named Entity Recognition Methods Based On Language And Distantly Supervised Model
6	Research On Intelligent Extraction Technology Of Earthquake Emergency Information Text Based On Improved BERT Algorithm
7	Research On Identification Of Bacteria Named Entity Based On Deep Learning And Language Model
8	Research On Entity Relation Extraction Of Geological Disaster Text
9	Research And Implementation Of A Biomedical Named Entity Recognition Method Based On Deep Learning
10	Research On Biomedical Named Entity Recognition Method Based On Word Meaning Enhancemen