Font Size: a A A

Research On Multilingual Text Recognition In Complex Scenes And System Design

Posted on:2024-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:1528306932462614Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Driven by economic globalization and consumption upgrading,international economic and trade development and outbound tourism have put forward higher requirements for multilingual translation.As the most intuitive information carrier,image data exists in all aspects of life.How to accurately recognize text information on images in different languages,understand semantic content,and realize automatic text translation is a problem that needs to be solved in text recognition.The recognition of road signs and commodities in outbound tourism,the translation of foreign language documents in foreign exchanges,and the review of illegal image information in the field of intelligent auditing all rely on text recognition technology,and multilingual text recognition is crucial to these applications.In recent years,deep learning technology has developed continuously in recent years and the overall effect of text recognition has been greatly improved,which has promoted the wide application of text recognition technology in different fields.However,in complex scenes,multilingual text recognition still faces challenges such as the shortage of multilingual data resources,different typesetting of different languages,different structures of different languages,poor recognition of low-frequency characters,poor recognition of vertical text,how to determine text blocks with complete semantics,etc.These problems present challenges for multilingual text recognition.Aiming at the above problems in complex scenes,this paper proposes a series of innovative solutions,and implements them on the system to significantly improve multilingual text recognition effect in complex scenes.In addition,the work of this paper effectively supports the research of related national projects.The contributions are summarized as follows:1.We propose a multilingual image synthesis method to overcome scarcity of multilingual data resources.This paper first obtains a small number of samples of new scenes and new languages,then uses the style extractor to model the scene priors and font styles that need to be migrated in the new scene.After that,we extract text information in different languages through the text encoder,then uses GAN network for data synthesis,thus realizing the adaptive data synthesis of new languages and new scenes,and greatly improving the performance of the text recognizer.2.We propose a text recognition model based on glyph subword modeling for some special languages.For some languages,there is the problem of "same shape with different code" or"same code with different shape".This paper builds an adaptive modeling strategy for the glyph structure of different languages.A word corresponds to a unique glyph,thereby eliminate the ambiguity of glyph.These modeling strategy can improve the recognition accuracy.3.We propose a text recognition model based on byte modeling for multilingual unified modeling.Multilingual unified modeling can effectively utilize data in different languages and effectively alleviate the problem of data scarcity in some languages.However,the traditional unified modeling method will combine all language dictionaries into one large dictionary,making the unified dictionary very large and makes model training very difficult.In this paper,we propose a byte-based method for multilingual text recognition,which makes the dictionary size only 256,effectively solving the problem of oversized unified dictionary.4.We propose a text recognition model fused with single-word model to improve the recognition accuracy of low-frequency words.The current text recognition modeling method is based on text lines.Since the frequency of low-frequency words in text lines with normal semantics is very low,this will make the training of lowfrequency words insufficient,resulting in poor recognition of low-frequency words.This paper proposes a text line modeling scheme based on the attention mechanism fused with the single-word model,so that the single-word model can be effectively integrated into the text line modeling,thereby effectively solving the above problems and greatly improving the recognition accuracy of low-frequency characters.5.We propose a text recognition model based on shared rotational convolution to improve the recognition accuracy of vertical text.In some languages,the typesetting is vertical.The usual practice is to turn the vertical text into horizontal text and then recognize it.This will cause character to rotate,while ordinary convolution has no rotation invariance,resulting in poor recognition.In this paper,we propose a shared rotation convolution,which can have rotation invariance and greatly improve the recognition accuracy of vertical text.6.We propose a text block segmentation model based on multimodal fusion feature to output semantically complete blocks of text.When translating the content of an image line by line,the lack of context information for each text line will seriously affect the actual effect of the translation.In order to solve the above problems,this paper proposes a novel multimodal text block segmentation encoder-decoder model,which makes full use of multimodal information such as vision,location and semantics,and uses the model to aggregate text lines into paragraphs.Based on this model,the entire text paragraph can be effectively extracted,and it can be translated as a whole,which greatly improve the overall translation effect.
Keywords/Search Tags:multilingual text recognition, multilingual text synthesis, multilingual text block segmentation, glyph subword modeling, byte modeling, text recognition method fused with single-word model, shared rotational convolution, multimodal fusion feature
PDF Full Text Request
Related items