Research On Character Recognition Based On Tesseract

Posted on:2022-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:A Zhang

Full Text:PDF

GTID:2518306557970609

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

The advent of the 5G era has brought production and life into a state of interconnection of all industries,and artificial intelligence technology is popular.The premise of intelligence is to recognize the input information,which is the basis for recognition.Common intelligent recognition can be roughly divided into two aspects: image and voice recognition.The object features of image recognition include text,face,fingerprints,etc.,while the application scenarios of voice recognition are more used for human-computer interaction and implemented at the machine level.artificial intelligence.As a text recognition engine independent of language scripts,Tesseract has excellent performance in recognition accuracy and robustness based on different previous feature selection methods.Due to the characteristics of open source,it is widely used on multiple platforms.Therefore,it is of practical significance to study the operating mechanism and functional framework of Tesseract,and to develop and optimize it based on it.Based on the theory of text recognition,this paper studies the basic principles of text recognition and introduces the basic process of text recognition.On this basis,starting from the Tesseract source code,through the study of Tesseract’s frame structure,identification principle and training method,we can deeply understand the operating mechanism of Tesseract,and explore the working principle of Tesseract from the bottom.On the basis of principle research,this paper implements a method of character direction detection in the classification stage to optimize character recognition.In addition,the training based on Tesseract has been studied,and the training of Chinese,numbers and English characters has been completed,and the characteristics of Tesseract have been fully utilized in the training process to optimize training and improve the efficiency of training and the accuracy of recognition..Finally,the Tesseract engine and the trained character library are used to realize recognition of optical characters.Combining the characteristics of the C++ programming language and the modular design method,the MFC framework is used for interface design,and the open source libraries Cx Image and Tesseract are used as the underlying foundation to complete the development of the text recognition system.Through the training of the character library,the packaging of Tesseract and the application of the MFC framework,an efficient interaction between the program and the user can be realized,which makes Tesseract gain further development momentum in C++.

Keywords/Search Tags:

Text Recognition, Tesseract, MFC, OCR, Text Recognition System

PDF Full Text Request

Related items

1	Research On Character Recognition Based On Tesseract
2	Research And Implementation Of Character Recognition System Based On Tesseract
3	Text Image Recognition Based On Improved Binarization Algorithm And Tesseract-OCR Engine
4	Research On Multilingual Text Recognition In Complex Scenes And System Design
5	Research On Deep Learning Based Text Detection And Recognition
6	Research On Text Recognition Techniques For Distorted Text
7	Research On Text Detection And Recognition In Complex Natural Scene Image
8	Research On Deep-Learning-Based Scene Text Detection And End-to-End Recognition
9	Text Location And Recognition In Natural Scene Image
10	Image/Video Text Extraction And Its Application