Font Size: a A A

Unsupervised Text Image Generation Based On Style Transfer

Posted on:2024-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y C XieFull Text:PDF
GTID:2568307067493764Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text and image generation is a generative task in computer vision that aims to use artificial intelligence technology to enable computers to automatically generate text images that meet given conditions.Font generation and scene text generation are two representative tasks in text and image generation,focusing on generating complex text without background and simple scene text with background,respectively.These tasks have a wide range of applications in fields such as font library generation,handwriting imitation,and data augmentation for recognition models.In recent years,style transfer-based text and image generation has received widespread attention due to its high-quality results and controllable content and style.However,existing style transfer-based text and image generation algorithms rely on paired images as supervised signals,limiting the application scope of these methods.This paper studies an unsupervised text and image generation method based on style transfer,which does not rely on true image as supervised sample,and explores the application of deformable network,self-supervised signal,fusion attention mechanism and other technologies in text and image generation to improve the quality of generated images and expand the application scope of text and image generation.The specific research work of this paper is as follows:(1)We propose a deformable font generation method that decouples the content and features of images through adversarial training and uses style transfer to achieve text and image generation.It can generate high-quality text images in an unsupervised scenario.We design a deformable encoder and deformable feature transfer module based on deformable convolution to learn low-dimensional feature mapping between texts,which solves the problem of insufficient text deformation and easy loss of text content in unsupervised text and image generation.The deformable feature transfer module performs deformable convolution on the low-dimensional content feature by predicting offset vectors,and then transfers the processed deformable feature to the decoder to help generate more complete images.Experimental results show that this method can effectively generate high-quality font images in unsupervised scenarios for both intra-language and cross-language tasks.(2)We propose a general scene text generation method that uses fusion attention mechanism to learn the global deformation and local stroke mapping between content images and generated images,using only recognition labels as signals.It can generate a large number of scene texts in any language.This method models global attention based on deformable convolution and local attention based on the similarity of adjacent point features,and learns the connection between content features and generated features by fusing the two attention mechanisms.By introducing high-resource languages,this method can generate a large number of low-resource scene texts with rich background and text styles through cross-language generation,providing effective training samples for OCR recognition methods.Experimental results show that this method can effectively generate low-resource language scene text images and improve OCR recognition performance.
Keywords/Search Tags:Text image generation, Style transfer, Unsupervise, Font generation, Scene text genertaion
PDF Full Text Request
Related items