Font Size: a A A

Chinese Address Parsing And Matching Method Based On BERT Pre-Training Model

Posted on:2022-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2480306722484054Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
As an indispensable geospatial data resource and strategic basic information resource in the process of social development,address has become an important part of the spatio-temporal basic framework of smart cities and a bridge for social big data organization,association and sharing applications.Address matching is the process of converting address description information to spatial coordinates by intelligent address resolution based on address model or coding specification and comparing the resolution results with the existing data in address database.With the wide application of location service technology,a large number of address data have been accumulated,including more complex and diverse types of address elements and their combinations.At present,the management and use of Chinese address is still in a relatively early stage,and the address service level is far from meeting the application needs of various industries in society.Therefore,it is necessary to deeply excavate the description law of Chinese address,realize the intelligent analysis and efficient matching of Chinese address,so as to better cope with the severe challenges brought by the rapid development of cities in the era of big data.Based on the characteristics of Chinese address,this paper constructs a classification system of address elements,analyzes the combination rules of address elements,explores the application of pre-training model in address parsing,and proposes an address matching method based on address element index.The main research contents and achievements include the following aspects :(1)Chinese address element classification and combination pattern analysisConsidering the differences of national culture and urban development in different regions of China,the status and characteristics of Chinese address are systematically analyzed.Based on the national address standards and industry standards,the classification system of Chinese address elements is improved.By analyzing the structural characteristics of Chinese address,the spatial relationship of Chinese address is summarized as topological relationship,directional relationship and distance relationship.The expressions of different spatial relations in the address are analyzed,and the common spatial relations words in the address are summarized.On this basis,the combination mode of different elements in the address is constructed,which lays a foundation for the study of address resolution and address matching.The use frequency of address elements and the combination mode of address elements are analyzed through experiments,which is of great significance for mastering the combination change rule of address elements and the standardized management and application of Chinese address.(2)Chinese address parsing based on BERT pre-training modelAiming at the difficulty of structured Chinese address parsing,the BERT pretraining model is used as the input of address semantics,and the bidirectional LSTM is used as feature extraction.The CRF model is used to synthesize context features,and the address parsing model based on BERT-Bi LSTM-CRF is constructed,which improves the accuracy of Chinese address parsing.According to the classification and combination mode of Chinese address elements,the labeling specification of Chinese address is constructed.It takes a lot of time and energy to annotate address data.This paper studies the enhancement method of Chinese address annotation data,which reduces the cost of manual annotation.(3)Address Matching Method Based on Chinese Address Element IndexBased on the actual demand of address matching service,the process of Chinese address matching is determined,and the matching strategies of different types of address elements are formulated.Aiming at the problems of poor flexibility,insufficient efficiency and easily confused elements of existing address matching methods,an index construction method based on address elements is proposed to improve the organization efficiency and matching performance of Chinese address data.Through the fusion of weighted address element similarity calculation and address semantic similarity calculation,the address similarity calculation model is constructed to improve the accuracy of address matching.The address matching evaluation method is constructed,and the experiment proves that the address matching method in this paper has high matching performance and matching accuracy.The research shows that the multi-level address element classification system and address element combination model constructed in the Chinese address element classification and statistical model have guiding significance for the standardized collection,management,construction and service of Chinese address data.In address parsing,the combination of pre-training model,deep learning model and statistical learning model can improve the effect of address parsing and have good parsing performance for various types of address elements.The extended BIOES annotation method can improve the efficiency of address annotation,and it is easier to be used for deep learning model training.In the address matching task,the address similarity calculation considering the content of address elements and semantic information has better matching effect.The index based on address element semantics has better retrieval effect and accuracy than the simple character index.
Keywords/Search Tags:Chinese address, Address element, Address parsing, Address matching, BERT model
PDF Full Text Request
Related items