Text carries information and is used frequently in human society.However,text is normally presented in images.Text recognition and text detection are the key technologies to understand the text in images.Nowadays,text recognition and text detection are widely applied in the fields of medical treatment,education,and digitization of documents,becoming a research tendency of pattern recognition.Furthermore,educational document is a special scenario of handwritten document,with the difficulties of character erasure,text line supplement,character/phrase switching,noised background,nonuniform word size,diverse text length and complicated layout.The technology of text recognition and detection based on deep learning is researched in this paper on the scenario of educational document.The main contributions of this paper can be summarized as follows:1.For text detection: we improve the one-stage text detection model and the two-stage text detection model,which includes multi-scale feature fusion,anchor box clustering,aspect ratio prediction and corner prediction branch.Among them,multi-scale feature fusion helps improve the quality of features extracted by the backbone.Anchor box clustering and aspect ratio prediction branch can reduce the difficulty of regression,and corner prediction branch alleviates the problem of insufficient receptive field for extreme long text.2.For text recognition: in this paper,an educational examination paper dataset is constructed,which covers common text recognition difficulties,and a multi-scale fully convolutional residual recurrent network is constructed aiming at the difficulties.Thanks to the multi-scale receptive field branch,the receptive field of feature extracted by the network better covers Chinese character,numbers,and punctuation marks,obtaining better recognition performance.In addition,experiments have also proven that the multi-scale receptive field features fused by multiplication work better.Subsequently,we proposed a data preprocessing method with partitions,which greatly reduced the training time of the model while not degrading system performance.3.End-to-end: to date,the most common method for document processing is to apply the technologies of document detection and text recognition in two divided tasks.However,this may lead to sub-optimal performances.Therefore,a fast end-to-end system,called adversarial feature enhancing network(AFEN),is proposed for offline handwritten paragraph recognition in this paper.The proposed AFEN system comprises five components: a shared feature extractor for robust feature learning,a text detection branch for text box proposal,Ro IRotate for oriented feature region extraction,an adversarial feature learning network for joint feature learning of text detection and recognition branch,and a text recognition branch for text transcription.Experiments show that the system has obtained excellent results in both performance and speed. |