| As a pre-processing step of the document analysis system,the binary segmentation of text and background,this process plays a key role in the accuracy and high visual quality of the extracted text,such as character recognition.Most binarization algorithms are built on low-level features in an unsupervised manner,so the input domain knowledge cannot be fully utilized,which greatly limits the distinction between foreground text and background noise.With the wide application of deep learning in various fields of computer vision,researchers began to use deep learning models to solve binarization problems,and achieved good segmentation results.In response to this,this paper focuses on the low-quality document image binarization algorithm based on deep learning.The main work and innovations are as follows:(1)Introduced twelve binary algorithms,including six classic traditional algorithms and six latest algorithms based on deep learning.Each algorithm was briefly summarized,and the advantages and disadvantages of the algorithm were analyzed through experimental results.(2)The first algorithm is to address the problem of limited neural network training data sets,and propose a text enhancement network(TANet)to expand the data set,making full use of existing document images;then the improved D-LinkNet network(MD-LinkNet)as a binary segmentation network.The network has two improvements,one is to add the remaining multi-core pooling(RMP)module and the cascaded hole convolution(CAC)module in the middle part of the codec to extract rich document stroke features;the second is to pool the low resolution The rate image uses DUpsample instead of traditional bilinear interpolation for upsampling,which combines the pixel neighborhood information of the document image.Using the data set and evaluation indicators provided by the International Document Image Binarization Contest(DIBCO),the algorithm is compared with twelve kinds of binarization algorithms.The experimental results show that the F value of Algorithm 1 is compared The sub-optimal U-Net has a 3.5% improvement.(3)The second algorithm aims at the uneven distribution of text in historical document images,which leads to noise in the binary segmentation of a single neural network.A cascaded convolutional neural network is proposed to solve the core problem of multi-scale information fusion of binary tasks.The algorithm first uses the U-Net network as the basic segmentation,which aims to retain the complete stroke information of the document;then the image test results of different proportions are fused and sent to the MD-LinkNet proposed by the algorithm for training and testing;and finally the convolution conditions are used Random field(ConvCRF)is post-processed to remove isolated noise points.Experimental results show that,while retaining the complete strokes,the algorithm can better suppress noise for document images with small text. |