Font Size: a A A

Research On Document Layout Analysis Method Based On Deep Learning

Posted on:2023-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SunFull Text:PDF
GTID:2568306791454484Subject:Optical engineering
Abstract/Summary:PDF Full Text Request
As a pre-task built on OCR,layout analysis and recognition has extremely important research value in the field of text recognition.The technology of document layout analysis and recognition based on deep learning is becoming more and more perfect.The Attention mechanism in deep learning has unique advantages for neural networks in filtering redundant or useless data features to obtain and better utilize useful information.Its features are similar to those of convolutional neural networks in target detection and recognition in computer vision the application of regional and variable scale complement each other.This paper explores a layout analysis and recognition method for deep convolutional neural networks using the Attention mechanism.First,by fine-tuning and optimizing the Res Net-101 pre-training model,the prior knowledge of images can be obtained to the greatest extent.Secondly,we adopt the strategy of combining the bounding box and the Attention mechanism,which not only reduces the amount of calculation,but also reduces the iteration time and difficulty,and at the same time enables the upper-level knowledge of the Attention to be explained(the effect of each bounding box on the final result can be described.Influence degree),can locate the problem faster when making parameter adjustment.Finally,Grad-Cam++ technology is used to characterize the influence factor size of each layer of the model corresponding to the final result,so that the shallow representation of the model can be further explained,and the influence of the Attention mechanism on the convolutional neural network is further described.Through the abo ve three methods,this paper attempts to implement a new target detector that combines the Attention mechanism and the deep convolutional network on the Pub Lay Net dataset,which not only has a significant improvement in Recall、Accuracy and other indicators compared to the baseline,but also in the Pub Lay Net dataset.It can also have certain advantages in training/inference efficiency.
Keywords/Search Tags:attention mechanism, deep learning, ResNet, Grad-Cam++
PDF Full Text Request
Related items