Now,people are generating a lot of information about addresses all the time.Chinese address is the most effective information describing spatial coordinates,the information is closely related to people’s behavior,can pass the information to locate infer the behavior of a person for a whole day process and trajectory,these records are widespread in the online shopping,short video APP,bank of communications,and other fields,sufficient for the data mining and analysis,for personal development and the national economy have a positive impact.At present,the research on Chinese address in China is still in its infancy,and the difficulty lies in the particularity of the address itself,the flexibility of grammar and the diversity of structure.Different from the address environment in foreign countries,the development of address planning and standard formulation in China is relatively late,and the use of existing technologies to match Chinese addresses will cause many problems.To solve the above problems,this thesis takes Chinese address matching as the research direction and conducts multiple researches on it.The main research contents and results are as follows:(1)On the basis of the original address library model to improve,proposed a new standard address library model construction scheme,and the wrong address classification,to build a unified address library system.(2)In word segmentation on the basis of the status quo at home and abroad,is studied in order to improve the address in Chinese word segmentation accuracy and recall rate,and in order to be able to better solve the problem of Chinese address of ambiguity,this thesis designed a memory neural network(LSTM)in length and conditional random field(CRF)on the basis of the training model of combining with ELMo address in Chinese word segmentation model,reducing the traditional segmentation method,the characteristics of artificial design engineering to make the algorithm more versatile,and for an alternative address information has good identification.(3)Introduce Map Reduce calculation engine,use the address tree model to match the address tree model after segmentation,design and implement a set of feasible Chinese address matching system for the address that fails to match and the unlogged words are added into the database,and use flink real-time calculation engine to analyze the address normalization results.The main innovation points of this thesis are:1.An algorithm for Chinese address segmentation based on ELMo-Bi LSTM-CRF is proposed.2.The address information is matched by the address tree model.The improved algorithm proposed in this thesis based on the Bi LSTM-CRF model has slightly improved performance on the basis of the original,with an increase of 1.15%. |