Font Size: a A A

Study Of Algorithms On Optical Character Recognition For Electronic Reading-Pen

Posted on:2006-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:S C YueFull Text:PDF
GTID:2168360152982216Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Offline machine printed character recognition, an important branch of pattern recognition, is one of the vitalest parts of intelligent interface. In the decades, with the progresses of technologies in the field of OCR, printed character recognition is gradually applied into commercial area, such asTH-OCR, Hanwang-OCR, Shangshu-OCR and so on. However, most of the commercial products work on desktops where images are obtained by scanner or they are embedded in the specific equipments, for example, card-reader system and license plate recognition system. A few of them working on PDA or cell-phone are restricted in online OCR. As a result, we develop an electronic reading-pen (ERpen) system using cheap hardware. The author has designed and implemented several specific OCR algorithms for ERpen that has been tested on ERpen.We mainly study character segmentation, character recognition and post processing in the field of OCR. The significances, the state of arts, new phases and developments of character recognition research are comprehensively reviewed and summarized, for the purpose of proving that the deep-going understanding is attained. The fundamentals of OCR are introduced firstly, and then we discussed the studies on classification and thresholding. The dissertation has surveyed the conventional methods of feature extraction of OCR, which are global statistical features, local statistical features and structural features. After that the frames of ERpen, including hardware and software, are designed and implemented.On the analyses of conventional methods, the author proposed a new character segmentation algorithm based on connected component, which requires lower segmentation conditions than the algorithm based on isolated character recognition. The proposed method does not need complex computation and consume less time by recognizing the connected component as a whole part. It mainly segments connected component by middle expansion method and peak-paddle function. As a result, recognition error arose by segmentation error can be reduced, Experiments prove that the proposed method is reasonable and fitting to ERpen.We select two complementary characteristics for classifying. The coarse one is an improved coarse periphery feature and the fine one is average line density. Furtehrmore, overlapping partition in coarse feature extraction is utilized to strengthen the stability of recognition. In our three-stage classification algorithm, fast and effective classification of first stage is implemented with low-dimension coarse feature. In second stage we use fine feature and Euclidean distance to reduce candidate set for further classification. At last, recognition result is gained by the weighted sum of the similarity between two feature vectors.The author chooses spelling correction based on dictionary to find out recognition error, which is proved feasible by experiment of character replacement. Since edit distance correction has high compulation complexity and cannot be applied in ERpen directly, we design three types of rules, including replacement, insertion, and deletion to deal with wrong cases. All rules can be carried out by replacement operation. Thus it can simplify operation and enhance process speed, so as to satisfy the real time requirement.
Keywords/Search Tags:Optical Character Recognition, Threshold, Character Segmentation, Feature Extraction, Multi-Stage Classifier, Post Processing, Electronic Reading-Pen, DSP System
PDF Full Text Request
Related items