Font Size: a A A

Place Name Recognition And Linking Based On Multi-Source Gazetteer In English Social Media Text

Posted on:2023-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:P F WangFull Text:PDF
GTID:2568306800952329Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As a cyber place for the people to socialize in the Internet era,social media has become an important carrier for spreading hot events and obtaining information.Geospatial location is an important attribute of events.Extract place names from social media texts and obtaining their geospatial location is of great significance to analyze and understand the laws of spatio-temporal evolution of events.However,the content of social media texts is informal and contains a large number of informal place names,including abbreviations,misspellings and aliases,which are difficult to detect.Meanwhile,the noisy context of place names makes it much more difficult to identify place names than formal texts.This dissertation takes the place name recognition and link method based on multi-source gazetteer in English social media text as the research object and carries out research work,the main work contents include:(1)Propose an unsupervised alignment method for heterogeneous geographic entities.Firstly,three similarity algorithms of name,space and type are proposed according to the characteristics of geographical entities.Then K-means algorithm is used to self-set the weight of each similarity algorithm which was needed in the similarity combination step.Finally,the matching results of heterogeneous geographical entity pairs are extracted iteratively based on the naive descent extraction algorithm to help achieve the alignment of heterogeneous geographical entity database.In the method comparison experiment,the proposed method achieves 89.52% F1 value which is 15.7% higher than the method based on voting aggregation.(2)Propose an unsupervised place name recognition method for English social media text.Firstly,a pre-extraction scheme of candidate toponym from social media texts is proposed,including text preprocessing,subject label segmentation and the preextraction of candidate toponym.Secondly,the positive samples of place names are extracted from multi-source gazetteer and negative samples are extracted from social media text corpus,and the rules of positive sample enhancement and negative sample weakening are designed according to the types of place names.Secondly,according to the short text features of toponyms,a C-LSTM + Attention mechanism model is proposed to judge whether the candidate toponym is a true toponym or not.Finally,according to the characteristics of candidate toponym,the algorithm of candidate toponym recognition is designed to achieve the judgment of candidate toponym.In the method comparison experiment,the proposed method achieves 78.14% F1 value which is 35.8%,29.5% and 9.3% higher than the three baseline methods: Stanford NER,rulesbased method and Gaz PNE,respectively.(3)Propose an English place name linking method based on multi-source gazetteer and Elastic Search search engine.According to the characteristics of Elastic Search search engine and the text characteristics of toponym,the scoring and ranking rules for toponym search are achieved,including toponym prefix enhancement and double-field query strategy.Then,based on the spatial relationship between geographical entities,a candidate toponym entity selection algorithm is designed based on DBSCAN density clustering algorithm to help achieve the linking from geographical names to geographical entities.In the comprehensive comparative experiment of methods,the F1 value of the proposed method is 18.9% higher than the method based on Nominatim.In the method comparison experiment,the proposed method achieves 73.36% F1 value which is 18.9% higher than the Nominatim based method.
Keywords/Search Tags:Social media text, Geographical entity alignment, Place name recognition, Place names linking
PDF Full Text Request
Related items