| In recent years,the rapid development of information technology has led to the rapid development of various industries,and the corresponding amount of data has also continuously increased,and address information has also been updated and increased accordingly.The appearance of the electronic map enables the address described by text to be mapped to the geospatial coordinates,and the technique to implement this mapping is geocoding.Geocoding generally includes address standardization,address segmentation,address matching,spatial positioning and other steps.Among them,the role of address segmentation is mainly to use a Chinese word segmentation algorithm to split the Chinese address into several minimum address units to prepare for subsequent address matching process.This is also the most core and crucial step in geocoding.This thesis mainly studies Chinese address word segmentation,and focuses on the analysis of the characteristics and rules of Chinese address composition,and uses conditional random field model to identify unlisted words in Chinese address segmentation,and constructs a knowledge base based on standard address model.We design a word segmentation algorithm suitable for Chinese address standardization segmentation based on this knowledge base,and develop a reliable Chinese address segmentation prototype system,and test and verify the system by experiments.The specific tasks in this thesis are as follows:1.Constructing a knowledge base based on standard address model.Address model is the organization of address elements in Chinese address that is object of research in this thesis,and it directly affects the design of the subsequent segmentation algorithm and the accuracy rate of the final result of word segmentation.Chinese address segmentation requires complete national administrative division level library,address feature words and geographic entity nouns database as the word segmentation basis through the research and summary of composition features of a large number of Chinese address data.2.Designing a effective Chinese address word segmentation algorithm.The design of Chinese word segmentation algorithm includes the selection of word segmentation algorithm and the design of processing segmentation result algorithm based on address composition rules.The word segmentation algorithm has been relatively mature,so this thesis uses the string matching method of double array trie tree to segment Chinese address based on the constructed knowledge base according to the characteristics of Chinese address.Aim at ambiguity,wrong segmentation and other errors in the initial results of word segmentation,this thesis also designs the segmentation result processing algorithm based on address composition rules to eliminate ambiguity,deduce and verify the result,and the algorithm greatly improves accuracy rate of the final result of word segmentation.3.Developing and implementing a Chinese address segmentation prototype system.Based onabove address knowledge base and Chinese address segmentation algorithm,the prototype system of Chinese address word segmentation is developed and implemented,and the performance and function of the system was tested and compared with the segmentation method based on rules experimentally.The experimental results show that the accuracy rate of the segmentation system using statistics and rules can reach 92.37%,and is far higher than the accuracy rate based on rules method,and it proved the reliability of the system. |