Font Size: a A A

Research On Character Recognition Based On Tesseract

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:A ZhangFull Text:PDF
GTID:2518306557970609Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The advent of the 5G era has brought production and life into a state of interconnection of all industries,and artificial intelligence technology is popular.The premise of intelligence is to recognize the input information,which is the basis for recognition.Common intelligent recognition can be roughly divided into two aspects: image and voice recognition.The object features of image recognition include text,face,fingerprints,etc.,while the application scenarios of voice recognition are more used for human-computer interaction and implemented at the machine level.artificial intelligence.As a text recognition engine independent of language scripts,Tesseract has excellent performance in recognition accuracy and robustness based on different previous feature selection methods.Due to the characteristics of open source,it is widely used on multiple platforms.Therefore,it is of practical significance to study the operating mechanism and functional framework of Tesseract,and to develop and optimize it based on it.Based on the theory of text recognition,this paper studies the basic principles of text recognition and introduces the basic process of text recognition.On this basis,starting from the Tesseract source code,through the study of Tesseract’s frame structure,identification principle and training method,we can deeply understand the operating mechanism of Tesseract,and explore the working principle of Tesseract from the bottom.On the basis of principle research,this paper implements a method of character direction detection in the classification stage to optimize character recognition.In addition,the training based on Tesseract has been studied,and the training of Chinese,numbers and English characters has been completed,and the characteristics of Tesseract have been fully utilized in the training process to optimize training and improve the efficiency of training and the accuracy of recognition..Finally,the Tesseract engine and the trained character library are used to realize recognition of optical characters.Combining the characteristics of the C++ programming language and the modular design method,the MFC framework is used for interface design,and the open source libraries Cx Image and Tesseract are used as the underlying foundation to complete the development of the text recognition system.Through the training of the character library,the packaging of Tesseract and the application of the MFC framework,an efficient interaction between the program and the user can be realized,which makes Tesseract gain further development momentum in C++.
Keywords/Search Tags:Text Recognition, Tesseract, MFC, OCR, Text Recognition System
PDF Full Text Request
Related items