| With the rapid development of Internet technology,more and more geographic information is involved in web pages.It has become an important way to access and update geographic information.Address information in the web page is in the form of text,Because there are no separators between Chinese characters,greatly hinder the computer direct understanding of location semantic describe the address information,the address information cannot be converted to the space coordinates to the map,provide accurate positioning for the people.Therefore,in order to make the computer to understand the semantic information of position Chinese address mapping,and establish the mapping of spatial information and non spatial information.The method of semantic analysis of Chinese address obtained from Internet has important application value.This thesis takes the Chinese address obtained from the web crawler as reserch object.According to the text of the Chinese address by semantic parsing,In this thesis,a Chinese address segmentation method based on statistics is adopted.this method does not depend on the dictionary of geographical names.It calculates the frequency of the corpus composed of 250 thousand address data obtained from the Internet,calculates the mutual information between the adjacent words,and the information entropy.Then,the address string is processed by full segmentation,and all segmentation schemes are obtained,the segmentation scheme with the least overhead arc is selected by calculating,Finally,the segmentation results are obtained by the confidence calculation.On the basis of this,this thesis uses the semantic annotation method of Chinese address elements based on Bayesian model to label the Chinese addresses,it establishes the Chinese address element tagging table,calculated the probability of each address expression pattern in address tagged corpus,constructed an address expression schema tree with the annotation status as the node,record the number of times passed through each node,then calculate the cost of each address element to mark the status in one address and the most likely address annotation state of the previous address element,finally the semantic tagging sequence of Chinese address is obtained by backtracking.In this thesis,the Chinese address semantic parsing method was used to experiment with different amounts of Chinese address data from the Internet,the experimental results of different number of corpora are analyzed in depth and compared with other methods.The experimental results show that the method has good effect on word segmentation of Chinese address without the dictionary of geographical names.And the semantic address of the address elements can be annotated,so that the text of Chinese address can be directly applied to the geographical information service by the computer. |