Font Size: a A A

Research On Intelligent Geocoding In Non-canonical Chinese Address

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:M J ShiFull Text:PDF
GTID:2370330620478732Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of economy,China's digital city is also gradually established.A large part of urban information is closely related to the geographical location,and people's life and social activities are mostly closely related to the address.The demand of the public and the city's functional departments for place name and address services is growing.At present,city address is mainly expressed and stored in the form of words.Geocoding technology can establish the association between address text data and visible spatial data on the map,integrate non spatial data and spatial data,and realize the application of address text data.However,due to the disunity of Chinese address expression and record mode,there are many incomplete structures,resulting in the ambiguity of address information,which is very unfavorable for the research work related to the city.Therefore,it is of great significance to analyze the non-canonical Chinese address,standardize the address text data,and establish the relationship between the address text data and the spatial data.In this paper,Suzhou city is taken as the research area.First,the address text data of Suzhou city is cleaned based on natural language processing technology,including the conversion of complex / simplified,full / half angle,and the resolution of the meaning of special symbols in the address.Then,the acquired address of Suzhou city is segmented based on conditional random field,and the address standardization is carried out for the ambiguity of address history former name and address level missing Finally,based on the improved Levenshtein Distance and trie tree algorithm,the non-canonical Chinese geocoding system is constructed.Through research and application,it improves the matching accuracy and efficiency of three kinds of common non-canonical Chinese address data(with special symbols,missing address levels,wrong characters),and provides a new way for non-canonical Chinese address resolution.The main contents of this paper are as follows:(1)This paper analyzes the development and application fields of geocoding technology,expounds the research background,research status and significance of geocoding,puts forward the research content and technical route of this paper,and discusses three common ways of Chinese address segmentation,namely,based on Dictionary(rule),based on semantic(understanding),based on statistics,and describes in detail The theory of random field model.(2)The address data of Suzhou city is acquired and processed,including datapreprocessing,data annotation and address disambiguation.The address data is preprocessed by traditional and simplified conversion,special symbol conversion and so on.A conditional random field tagging system suitable for Suzhou address data is constructed.After comparing and selecting the appropriate feature template,the method of manual and machine tagging is adopted to tag the address.This method does not need to build the address element dictionary in advance,but also avoids the ambiguity of the address to a certain extent.Based on the address level,this paper proposes the methods of non-canonical Chinese address completion and historical used name to correct address ambiguity.(3)Build a non-canonical Chinese intelligent geocoding system.First,the address of user search address is cleaned,and then based on the improved Levenshtein distance algorithm and the trie tree structure of Suzhou address data,the address text information is transformed to spatial geographic coordinates.
Keywords/Search Tags:non-canonical geocoding, address cleaning, conditional random field model, Levenshtein distance, trie tree
PDF Full Text Request
Related items