Font Size: a A A

Research On Chinese Address Normalization Technology By Fusion Of Attention Mechanism And Sequence Generation Network

Posted on:2024-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:B YangFull Text:PDF
GTID:2530306935482124Subject:Surveying the science and technology
Abstract/Summary:PDF Full Text Request
Address is one of the important carriers of geographic information,and it is an important link connecting people,things,and things.It has an important impact on people’s daily life,business management and even government decision-making and deployment.With the wide application of big data collection and storage technology and analysis methods,massive address data in the Internet can be completely collected and organized,but this kind of data exists in unstructured and non-standard forms,and computers cannot directly Analysis,which limits the further application of addresses in specific scenarios.Address normalization is an effective way to reconstruct the structure and content of addresses.How to standardize addresses accurately and efficiently has become a hot spot in the research of geographic information technology.Starting from the address normalization process,this paper discusses the address normalization method based on the address model and address resolution research,including:(1)Aiming at the problem that the existing address model only implements coarse-grained labeling of POI(Point of Interest)names,some semantic information is lost.Based on the study of the existing address model,this paper further standardizes address elements such as POI and complex roads on the basis of the spatial relationship address model,and improves the overall analytical performance of the model.(2)Considering that the existing address parsing model fails to take into account the semantic information at the entity level,a larger corpus is often required for complex addresses,the training cost is high,and it is difficult to improve parsing performance.This paper proposes an address resolution method based on a knowledge-augmented model.In the pre-training process of the model,the large-scale text and knowledge map data are used as input together to learn the general semantic information of the text,and then use the small-scale corpus to fine-tune the ERNIE(Enhanced Representation through Knowledge Integration)model,and finally use the conditional random field model Realize the mapping of characters and labels.The experimental results show that,under the premise of using a small sample,the model proposed in this paper has obtained an F1 value of 0.979,which is the best compared with other analytical models.As a result,high-quality external input information can be provided for address normalization.(3)Aiming at the problem that the text-matching address normalization method has a cumbersome process and relies heavily on the feature dictionary,this paper proposes a Chinese address normalization method that integrates the attention mechanism and the sequence generation network.Firstly,an encoder that takes the span of address elements into account is designed,and external vocabulary information is integrated into the bidirectional long-short-term memory network to enhance the semantic representation of address text.Then,a decoder with an attention mechanism is built,and the attention mechanism is used to fully take into account the hidden layer output of the encoder at each moment.Finally,the address text is used as the experimental object to demonstrate.Experimental results show that the proposed model obtains a BLEU value of 81% and a ROUGE-L value of 74%,which are the best results among similar models.In addition,this paper also proposes an address normalization evaluation method based on text similarity,and conducts normalization experiments and evaluations on different types of address data,and further finds that there are obvious differences in the ability of the model to deal with different types of non-standard addresses.It is most strongly affected by the number of administrative divisions.
Keywords/Search Tags:Address Parsing, Address Normalization, Pre-trained Model, Text Generation
PDF Full Text Request
Related items