| At present,the research on end-to-end text detection and recognition models has made good progress.The research in this field is mainly divided into two directions.One is the combination of text line detection and RNN decoding.This method cannot detect a single character border.The other type is to combine Faster RCNN detection algorithm and ROI pooling to build a two-stage model.This type of method can detect each character,but it requires a lot of calculation.In this research work,we propose a new type of end-to-end single-stage model that directly predicts the borders of individual characters and corresponding character categories,overcoming the limitations caused by RNN decoding and ROI poolingbased methods.This research uses the fusion method of feature maps of different scales in the backbone network,which significantly improves the detection and recognition performance.In order to optimize the detection of small characters,a random copy strategy is used to expand the number of small characters and enrich the positional diversity of characters.For some noises that deviate significantly from the text area,this paper proposes a new post-processing method that can effectively filter noises.Because there are very few public datasets for character-level handwriting text detection and recognition,we have developed a set of handwritten text automatic annotation systems.This system uses knowledge transfer methods to conduct model training on synthetic handwritten image data.Character detection and recognition on real text images.Experiments prove that the detection m AP of the system on real images reaches 87%,and the recognition accuracy reaches 70%,and the application of the system can save more than 70% of the time for manual labeling.In the automatic labeling system,we use the text line network model and the document network model to automatically generate labels.The text line network is based on the text line character detection and recognition model.This study innovates the character center positioning network branch of the text line model.The unbalanced loss function is used to increase the weight of characters that are easy to locate and error,thereby improving the overall character detection and recognition.performance.The document network model uses the multi-scale fusion single-stage model proposed in this paper,and the annotation effects on the real images of these two methods meet the practicality. |