Font Size: a A A

Off-line Handwritten Mathematic Symbols Recognition Using Topological Feature Construction

Posted on:2010-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J W YangFull Text:PDF
GTID:2178360275962616Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Words are a kind of vital tool for human communications. With the rapid development of computer and information technology, it has become a very important research field to process and recognize characters with machine. OCR (Optical Character Recognition) is an automation technology developed gradually in the 20th century. Off-line handwritten character recognition is an important branch of pattern recognition, which involves artificial intelligence, image processing, information theory, digital signal processing, fuzzy math, computer sciences, among others. It is a kind of comprehensive technique, which has important practical value and theoretical significance in information processing, machine translation, office automation, artificial intelligence and other high-tech areas.There are many kinds of characters need to be inputted into computers, including raw data record, tax bills, accounting vouchers, financial notes, snail mail and students'examination papers. If we input those data manually, things will be very difficult and the efficiency is very low. Although OCR technologies in recognizing characters and digits have been developed nearly perfect, mathematical symbol recognition remains a big challenge. However, mathematical symbols play a vital role to scientific researchers, mathematic professionals, and the public.Aiming at this problem, this paper carries out the necessary analysis. The author sets up a sample database of handwritten digits and preprocesses off-line handwritten mathematical symbols and off-line handwritten Greek letters respectively. Finally, the author carries out their feature extraction, classification and recognition in the experiments.The handwritten mathematic symbols recognition technology has been studied and discussed in this paper. And the paper tries to find some sort of feature extraction (such as: Students'number and date in their papers, and the Greek alphabet of high frequency in math test, and MNIST handwritten digital symbols library.) which belongs to some certain scope in order to achieve a very high recognition rate. This article proposes a new method through the topological features constitution, which is based on image pre-processing. Finally, it utilizes the method of classification tree to identify categories.In the aspect of the feature extraction, the thesis proposes a method through the topological features construction, and uses this method to the mathematic symbols. Common sense of human recognizing character shows that the topological structure plays a decisive role in character recognition, especially in the identification of a single character. The original image contains limited topological structure information, so it isn't able to reflect the object or some part of its bending direction, extent, branches relationship, while these structures are very important for distinguishing characters. So, this thesis proposes a method of topological features construction as follows: add a few pixels in one side or some sides of a character image. The new added pixels constitute a new topology with the pixels in the original image. Then we calculate the counts and location of the new topology connectivity regions to get the identification characteristics. The new constructed topology contain connected areas (rings) which can reflect the image or some part of its bending direction, bending direction, extent, branches relationship. So, those structures can provide a valuable basis for classification and identification.In the aspect of the pre-processing, we use the undergraduate students'number, date and the Greek letters which are most commonly used in their higher mathematics test as samples. We adopt the traditional pre-processing, including the image grayscling, single-character segmentation, binarization, character smoothing, removal of interference, and the character normalization. Finally, we set up a handwritten math symbol library with the character images after pre-processed, imitating MNIST handwritten digital symbols library. And the library has the same margin background.We build a classification tree method to classify and recognize mathematic symbols. The experiment shows that the method is not only fast, but has good classification results. Identification system carries on the classification and identification experiment with our own handwritten mathematical symbols library. The best handwritten digits recognition rate is 93.5%, error rate is 6.0%, identification resistance rate is 0.5%; the best handwritten Greek alphabet recognition rate is 93.7%, error rate was 5.4%, identification resistance rate is 0.9%. The results of the experiment show that the method has a high recognition rate.
Keywords/Search Tags:handwritten mathematic symbols recognition, pre-processing, feature extraction, classification tree
PDF Full Text Request
Related items