Font Size: a A A

Research On Chinese Address Segmentation And Matching For Fusion Corpus

Posted on:2022-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L PengFull Text:PDF
GTID:2480306731454354Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Address matching as a key part of geographic coding,is an important part of smart city construction.The complete meaning of address matching is divided into address segmentation and address matching.Address segmentation provides segmentation service for address matching.Due to the complex diversity and nonstandardness of Chinese address,there are problems in the process of Chinese address segmentation,such as uneven quality level of training corpus,nonstandard address affecting the segmentation effect,and low matching rate and accuracy of address matching results.In view of the problems in Chinese address segmentation and matching methods,this paper conducts the following research :(1)Research on Chinese address segmentation method for fusion corpus.The method quantitatively calculates the value of address data in70,000 corpus sets by different indicators,and selects the address as the weight mechanism corpus according to the value level.The Bi LSTM+CRF model is used to train corpus for word segmentation experiment.The experimental results show that under the same amount of training corpus,the Chinese address word segmentation method based on corpus fusion can effectively improve the segmentation effect of Chinese address.The F1 value and AP value of the fusion corpus method are higher than the F1 value and AP value of other methods.In this study,when the data volume of the fusion corpus reaches 60000,and the data augmentation ratio is 0.2,the fusion corpus has the best segmentation effect,the F1 value is 91.6 %,and the AP value is 89.3 %.(2)Research on Chinese address matching method based on multi-tree structure.Based on the hierarchical constraints of address elements,this method constructs a hierarchical multi-tree to store standard address database data.In the process of matching the results of address segmentation to be matched with the address data of multi-tree structure,aiming at the failure of nonstandard address matching,the address matching is completed by combining the hierarchical backtracking matching and text similarity matching method.The experimental results of real data show that the address matching method based on multi-tree structure has more advantages than other matching methods.The address matching method based on multi-tree structure obtains higher matching rate and higher accuracy in two sets of data experiments.The higher matching rate and accuracy are more than 92 %,and the lower value is more than 85 %.
Keywords/Search Tags:Fusion corpus, Chinese address segmentation, Multi-tree structure, Chinese address matching
PDF Full Text Request
Related items