| Text is the main carrier for storing information.It exists not only in paper documents,but also in multimedia such as images and videos.In recent years,the deep learning method had an outstanding performance in text recognition tasks,and deep learning-based text recognition has become one of the most popular fields in computer vision.On the one hand,in order to meet the recognition accuracy of text usability,deep learning-based text recognition methods usually need a large number of labeled data to train the model.However,it is more time-consuming and expensive to annotate the training samples.Due to the wide range of text recognition applications,it is unrealistic to obtain enough labeled data for training for different applications.On the other hand,due to the diversity of text carriers and application scenarios,the differences between text fonts can not be ignored,such as handwriting in scanned documents and artistic characters on billboards,etc.The performance of recognition models trained on one type of text dataset is often unsatisfactory on another type of dataset,which also brings challenges to the text recognition task.Unsupervised domain adaptation(UDA)is a method to transfer knowledge from labeled source domain data to unlabeled target domain data.Recently,unsupervised domain adaptation has been introduced into text recognition tasks to solve the problem of style differences in text recognition in different application scenarios,namely,domain offset.Although some UDAbased text recognition methods have achieved good performance,these methods still have some defects.On the one hand,text is a sequence of characters with variable length,while some existing methods treat the text image as a whole for domain adaptation,which will make the model ignore the domain adaptation of fine-grained characteristics such as character strokes,shapes,and so on,resulting in insufficient knowledge transfer information.On the other hand,under the condition of unsupervised domain adaptation,the target domain has no label information to supervise the domain adaptation,especially at the character level.Therefore,these methods cannot ensure that the global/local features adapted are sufficiently distinguishable.How to use the knowledge learned from the source domain data to supervise the recognition of the target domain has become the key problem faced by the UDA-based text recognition methods.Focusing on the problem of data domain offset in text recognition tasks,this thesis carries out a more in-depth study on the level of applying unsupervised domain adaptation,and locallevel feature classification methods.The main works of this thesis are summarized as follows:(1)The author proposes a text recognition method via dual adaptation and clustering(DOC).In terms of domain adaptation,the global-level and local-level domain discriminators are constructed,and the domain invariance features are extracted by means of adversarial learning to ensure the full transfer of knowledge from the source domain to the target domain;In terms of local feature classification,the adaptive feature clustering(AFC)module included in the DOC model filters and clusters the local features from the source and target domains to further supervise the recognition of the model in the target domain.(2)At present,the UDA-based text recognition methods only focus on the alignment of word-level features,while ignoring the category of local-level features,thus limiting the accuracy of recognition.The author proposes a text recognition method via dual adaptation and character classification(DACC),and constructs an adaptive classification and clustering(ACC)module on the basis of the AFC module.This module introduces local feature classification in source domain based on feature clustering,for further exploring the role of feature classification methods in the UDA-based text recognition methods.(3)A large number of comparative experiments and visualizations are carried out on several widely used scene text datasets and handwritten text datasets.Different feature classification methods are compared to illustrate the effectiveness of the text recognition framework proposed in this thesis from multiple perspectives. |