Intelligent Address Correction And Completion Method Based On LEBERT And Knowledge Graph

Posted on:2023-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:J D Lin

Full Text:PDF

GTID:2568307046493744

Subject:Computer Science Computer Technology (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Chinese address resolution is an indispensable technology in express,takeout and other industries,digital cities,user behavior analysis and military fields.Chinese address resolution mainly includes Chinese address segmentation and address matching.Different from the general Chinese word segmentation,the place names in Chinese addresses have obvious hierarchical relationship and certain naming rules.Therefore,the research on word segmentation technology for Chinese address will help to improve the technical effect in this field.In addition,at this stage,many Chinese addresses are still entered manually by users,and users will input addresses according to their habits,which leads to various nonstandard problems in the generated Chinese addresses,such as the omission and missing of some address information,the use of old name aliases and other problems,which brings difficulties to accurate address matching.In order to solve the above problems,this paper proposes an intelligent Chinese address resolution scheme based on word lexicon enhanced Bert(LEBERT)and address knowledge graph.The specific work includes the following aspects:1.Capture the public standard hierarchical address data and the alias name,old name and other information of place names in the National Civil Affairs Bureau to provide data support for the subsequent construction of address knowledge map.In addition,the existing address data are labeled to provide data support for subsequent model training.2.For Chinese address data analysis,the primary task is word segmentation.After studying the research status of Chinese word segmentation and Chinese address word segmentation at home and abroad,this paper uses the deep learning model of lexicon enhanced Bert,combined with bidirectional long-term and short-term memory network and conditional random field to segment Chinese address words.This model can make better use of the characteristics of vocabulary level in address text.Compared with the original BERT,the accuracy,recall and F score of the model in this paper have increased by 0.66,0.52 and 0.58 percentage points respectively.3.Using the standard hierarchical address data obtained in the first step,the address knowledge map database is constructed.Introduce alias,old name and other information to enrich the semantics of the knowledge map base.Using the address data that has been segmented in the previous step,according to several possible errors in the data,a matching algorithm based on the address knowledge graph base is designed to match and correct the address data that has been segmented.Finally,compared with the existing matching tool,the matching accuracy of "provincial","city" and "district" units is improved by 2.12,1.56 and 1.12 percentage points respectively.

Keywords/Search Tags:

Chinese address segmentation, Chinese address matching, LEBERT, Knowledge graph

PDF Full Text Request

Related items

1	Chinese Address Segmentation And Matching Based On ELMo
2	Design And Optimization Of Chinese Address Matching System
3	Design And Implementation Of Segmentation System For Chinese Address Based On Statistics And Rules
4	Research On Technology For Chinese Address Service
5	Research And Implementation Of Technology For POI Chinese Address Fuzzy Matching
6	Research On The Analysis Method Of Chinese Address Semantic For Internet
7	Research On Chinese Address Segmentation Method For A Small Amount Of Labeled Data
8	Research Of Chinese Address Standardization Based On ALBERT
9	Address Data Application Study In Matching And Consolidation
10	Research Of Geocoding For Security Business Information