Font Size: a A A

Complex Document Layout Analysis With Deep Learning

Posted on:2024-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:E J ZhouFull Text:PDF
GTID:2568307052495694Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Document layout analysis task is mainly used to identify and classify document image elements.It is commonly processed by semantic segmentation in deep learning,and document layout analysis has a wide range of applications in the fields of natural language processing and computer vision.With the continuous development of document styles,the layout to be processed becomes more and more complex,and the requirements for document layout analysis become higher and higher.Researchers start to focus on more complex non-Manhattan layouts.Non-Manhattan layouts have the challenges of data deficiency and complex text layouts,which lead to discontinuities in layout semantics.This paper proposes complex document layout analysis methods based on deep learning to address the semantic discontinuity of layout analysis in non-Manhattan layout tasks.The contributions of this paper are as follows.For the problem that the pixel-based classification method ignores the difficulty of predicting regional continuity,this paper designs a layout object constraint analysis method based on position encoding.This method is built based on Mask R-CNN and innovatively proposes a location information processing module.This module encodes the location information of the document into the input feature map through the location encoding module based on Transformer to obtain the feature map with fused location information encoding.Thus the new feature map carries the location encoding information.In addition,to better utilize the feature map,this paper proposes a border constraint module based on object detection,which enhances the semantic segmentation results of documents and the consistency of location.The performance of the proposed method is verified on Publay Net and DSSE-200 in this paper,and the experimental results show that the proposed location-encoding-based layout object constraint analysis method has significant effects.To address the semantic discontinuity problem caused by the adhesion of segmentation results,this paper designs a mask constraint-based document layout analysis method based on the Atrous Spatial Pyramid Pooling(ASPP)structure.By fusing the semantic segmentation mask results on the feature map of the original image,this method constructs input features with stronger robustness and then extracts multi-scale information using different scales of cavity convolution to compensate for the semantic discontinuity caused by the adhesion of segmentation results.In addition,to further benefit the original image features and semantic segmentation results,this method also designs a clustering method based on RGB distance to determine similar pixel points,which further exploits the multi-scale semantic features by aggregating similar results.In this paper,the performance of the proposed method is verified on FPD and DSSE-200 datasets,and the experimental results show that the proposed method of document layout analysis based on mask constraints has significant results.For the problem of overfitting due to the lack of data,this paper designs a method to generate Layout Gan non-Manhattan-style layout document images based on LayoutGan.The method generates non-Manhattan layout by calculating the overlap score of layout information.For the shortage that Layout Gan can only generate the position of document layout boxes,this paper proposes a strategy of using the dichotomous finding method to fill the text material and image material into layout boxes.This paper uses this method to generate a batch of data to solve the lack of non-Manhattan layout data.
Keywords/Search Tags:document layout analysis, deep learning, semantic segmentation, layout generation
PDF Full Text Request
Related items