Font Size: a A A

Spatio-Temporal Information Extraction Method Of Microblog Emergencies Based On BiLSTM-CRF And Classified-Hierarchical Annotation

Posted on:2022-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:L Y HuFull Text:PDF
GTID:2480306494999269Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Urban emergencies refer to events that occur suddenly in cities and cause or may cause serious social harm,such as natural disasters,accidents and disasters,public health events and social security incidents.The rapid perception and spatialization of urban emergencies are helpful for decision makers to make timely decisions and allocate emergency resources reasonably.The traditional monitoring methods have limitations in time and space,and have shortcomings such as labor consuming,high cost,poor information timeliness and spatial visibility.In view of this,based on the characteristics of Microblog's free,massive,fast dissemination,and strong real-time,this paper proposed a method for extracting spatio-temporal information of emergencies from Sina Weibo based on BiLSTM-CRF and classified-hierarchical labeling.And the fine extraction and spatial display of spatio-temporal information about urban emergencies were realized.The main work and results of this paper are as follows:(1)The acquisition and processing methods of urban emergency data in Sina Weibo are explored.This paper developed themed web crawler by using Python and Selenium,and obtained the Sina Weibo data of emergencies in Nanchang City from January 2018 to December 2019 with the theme of "fire" and "car accident".In addition,the Weibo data cleaning method based on the similarity of the topic dictionary and text is designed to filter and de-duplicate Sina Weibo data,and the preliminary Sina Weibo emergency corpus is obtained.(2)A classified-hierarchical spatio-temporal information annotation system is designed and a Weibo emergency labeling corpus is constructed.According to the city name coding rules and common time expression classification method,this paper designed a classified-hierarchical spatio-temporal information annotation system suitable for Weibo emergencies based on the BIESO boundary labeling method.And the annotation system is used to annotate Weibo data,a corpus of Weibo emergencies is obtained by using this annotation system to tag Sina Weibo data.(3)The extraction method of spatio-temporal information of emergencies from Sina Weibo based on BiLSTM-CRF is developed.This paper constructed a spatiotemporal information extraction model of emergencies from Weibo based on BiLSTMCRF.Furthermore,this paper used pre-trained word embedding instead of randomly initialized word embedding,and the model is trained based on the Weibo emergency corpus.(4)The method of emergency spatialization based on semantic inference and geocoding is studied.In view of the characteristics of non-standard expression and incomplete composition of the spatio-temporal information of emergencies extracted from Sina Weibo texts,this paper designed a method for reasoning and complementing the spatio-temporal information of emergencies,standardizing the time information and place names.Meanwhile,and the geocoding technology is used to spatially display the emergencies.(5)Based on the combination of BiLSTM-CRF and CHSIAS,the experiment of extracting and spatializing the spatio-temporal information of emergencies from Weibo text is completed.Compared with the corpus annotation system of People's Daily,the experimental results demonstrated that CHSIAS can get a higher F-measure and obtain multi-level and refined spatio-temporal information of emergencies when they are combined with CRF,BiLSTM-CRF(Random Word Embedding)and BiLSTM-CRF(Pre-trained Word Embedding)respectively.Moreover,the F-measure of CHSIAS combined with BiLSTM-CRF(Pre-trained Word Embedding)reached the highest score,which was 91.65%.When combined with CHSIAS,BiLSTM-CRF(Pre-trained Word Embedding)had the best performance of recognizing the time information,and for the position information,the F-measure of identifying Road,POI,building and relative position based on BiLSTM-CRF(Pre-trained Word Embedding)were 3.41%,4.18%,5.87% and 15.99% higher than the BiLSTM-CRF(Random Word Embedding)method respectively,also 4.76%,5.85%,3.91% and 3.67% higher than the CRF method respectively.On this basis,the extracted spatio-temporal information is standardized,and the emergency events are spatially displayed by using the spatialization method.Compared with the traditional emergency monitoring means,the research results of this paper are beneficial to enrich the theory and method of social media data integration,provide strong technical support for residents' smart travel,environmental and social governance,as well as government's rapid emergency response.
Keywords/Search Tags:Weibo, Named Entity Recognition, Text Information Extraction, BiLSTM-CRF, Emergencies
PDF Full Text Request
Related items