Research On Tesseract＿OCR Based Text Recognition System

Posted on:2021-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhang

Full Text:PDF

GTID:2428330614963693

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,word recognition has become the most frequently used technology in daily life,especially in libraries and newspapers.To save money,a large number of text documents such as books,newspapers and magazines are stored as electronic documents.With the help of constantly updated electronic equipment products and technologies,OCR(Optical Character Recognition)has become an important link to realize intelligent input in library books,periodicals,newspapers,magazines or words saved in the form of pictures,which not only improves the efficiency but also saves the cost.When using OCR technology for text information recognition,only the text carrier needs to be made into a picture form for preservation,and then input into the text recognition system.With the development of the information age,the accurate and rapid recognition of fonts in various languages has become one of the important topics in the field of computer science.Because OCR technology is used for text recognition,the acquisition of original text pictures is an important factor for high precision text recognition.The ideal image-capture device is a scanner,which removes any background from the image and ensures that the image is front facing.But the scanner is not often in life,most of the case is to use the phone camera for text image shooting,but the image quality is relatively low,will encounter some problems,such as uneven light caused by distortion,camera focus is not accurate.In order to solve these problems,a series of preprocessing work is done in this paper,such as image binarization,sharpening and enhancement,denoising and correction.The process of image preprocessing is an important factor to ensure that the text is correctly recognized.Because the OCR technology is used for text recognition,the acquisition of original text pictures is an important factor for accurate text recognition.The ideal image-capture device is the scanner,which not only ensures that the image has no background but also ensures that the image is front facing.But the scanner is not often have the life,in most cases is to use a mobile phone camera for text images,although convenient,but it shot out of the picture quality is lower,there will be some unpredictable problems,such as uneven distortion caused by light,the camera focus inaccuracy of the fuzzy image,etc.In order to solve these problems,a series of image preprocessing is done,such as image binarization,sharpening and enhancement,denoising and correction.Image preprocessing is an important factor to ensure that the text is correctly recognized,and it is also an important guarantee that the common text is fully covered when training the custom character library.Then is the Tesseract engine source code research and use,through the study of the source code,a deeper level of research on the principle and process of text recognition.This paper studies how to train the custom character library and optimize the training process,and realizes the character recognition system with the custom training library.Finally,based on the research of image processing,the application of C++11 and the research of tesseract engine,the whole process of text recognition is encapsulated,and a visual interface tool is developed by using MFC application framework in VS2015 environment to realize the whole process of text recognition.At the same time,the visual tool has carried out rigorous black box,performance and other tests to verify the robustness and stability of the tool.

Keywords/Search Tags:

Tesseract Engine, MFC, OCR, C++11, OpenCV, Digital Image Processing, Text Recognition System, Character Training

PDF Full Text Request

Related items

1	Research And Implementation Of Character Recognition System Based On Tesseract
2	Digital Business Card Recognition System Based On IOS
3	Text Image Recognition Based On Improved Binarization Algorithm And Tesseract-OCR Engine
4	Research On Character Recognition Based On Tesseract
5	Design And Implementation Of Dynamic Text Recognition System
6	Research And Implementation Of Biz Card Recognition System Based On Tesseract-OCR Engine
7	The Research Of Fluctuating Target Image Recognition Based On OpenCV
8	Character Recognition Based On Embedded Linux
9	Research And Implementation Of Searching Test System Based On Image Recognition
10	Design And Implementation Of OCR Application Based On Android