Font Size: a A A

The Structuring&Matching Of Address In Chinese Address Recogniton System

Posted on:2013-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y YaoFull Text:PDF
GTID:2218330374967279Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology, pattern recognition, especially OCR technology, is getting more and more important in scientific research, daily life and national economy. It is postal industry that OCR technology is first utilized. Recognizing the address in the image of envelope, comparing the result with addresses in the postal address database, and finding out the post office that the letters belong to, with these ways, the letters can be sorted automatically.In the traditional postal address database, the address is expressed in the form of entries. The address, which is in the standard form of leveled model, can be divided into different parts, such as road name, road number and so on. And then find the matching address to get the final result-post office. While the address is in the special form, for example,"the crossings of XX Road and XX Road" and "XX Road (near XX Road)", the matching address could not be found. In fact, due to the different writing habits of users, in some area, there is a lot of letters whose address is in non-standard form. And the efficiency of mail sorting machine will be impaired. It will be a good solution to introduce the topological structure into the postal address database, which will embody the geography relevance between addresses, make them more clearly, and enhance the address adaptability.During address matching, the exact match is commonly adopted but with deficiency. The matching result could not be worked out when the address in the envelope is wrong. According to statistics, most of errors result from the misuse of the homophone characters and the sound-like characters. In this paper, an extended edit-distance based on Pinyin is introduced to measure the similarity between two Chinese strings. The fuzzy matching result is worked out by taking Chinese character similarity and pinyin similarity into consideration. Experiments show that with this method, the data processing rate can be effectively improved while only a slight drop on accuracy rate.
Keywords/Search Tags:topological structure, edit distance, fuzzy match, pinyin similarity
PDF Full Text Request
Related items