Font Size: a A A

A Method For Determining Superscript/Subscript Of Printed Mathematical Formulas

Posted on:2009-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y R YuFull Text:PDF
GTID:2120360242485096Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the development of computers and internet, more and more people use computers to deal with routine work and store information. Therefore, converting the printed documents through the OCR (Optical Character Recognition) technology into the retrievable and editable electronic documents for storage and transmission has become an important way. The mainstream OCR softwares can accurately and efficiently recognize common texts. But for the mathematical formulas, the recognition result looses some original meaning according to the documents. Mathematical formulas are important components of most science documents, and act as the core of some literatures. The value of a literature will be greatly reduced without formulas. As a result, researchers begin to study new recognition systems for mathematical formulas. Compared to common texts, mathematical formulas have many different features, such as radical expression, fraction, superscript/subscript, limit, matrix, etc. The shape and location of formulas are not always fixed, and some symbols may have different meanings under different situations, making mathematical formulas become complex two-dimensional structures. Thus mathematical formula recognition should include both symbol recognition and structure analysis.This thesis is organized as follows. Chapter 1 provides a brief review of pattern recognition and neural networks, and illustrates the work flow of mathematical formula recognition system. The related content on image pre-processing, formula extraction and character segmentation are introduced in Chapter 2. In Chapter 3, character feature extraction based on moment feature is introduced, and the performance of BP neural network's classification ability is tested. Finally, we combine a SOFM neural network with several BP neural networks to form a multi-stage neural network model as a recognizer, and give a character recognition test for it. Chapter 4 is the focus of this thesis. After the analysis and comparison of several methods for determining superscript/subscript relations, this thesis presents an improved method based on the projection method and the contour tracing algorithm. The experimental results show that this method can adapt well to formula features and has a good rate for correct labeling.
Keywords/Search Tags:Neural Network, Formula Recognition, Superscript/Subscript, Projection, Contour Tracing
PDF Full Text Request
Related items