Font Size: a A A

Document Image Classification Based On Multimodal Feature Fusion

Posted on:2024-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:F J WuFull Text:PDF
GTID:2568307100989469Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Document image classification plays a very important role in document image research and application tasks,the images contain three different modalities: image,text and structure,and all three modalities play an important role in characterizing image category information.Since traditional methods use a single modality to classify document images and fail to utilize all features of document images as much as possible,this paper adopts three modality fusion methods to classify document images in order to improve the classification effect of document images.First,the visual features of the document image are deeply mined.In the first step,the overall features of the document image are extracted and classified using the Efficient Nets network;in the second step,the features of each region of the document image are extracted using the Efficient Nets network,and the extracted features are channel fused.Secondly,this paper extracts the text features of the document image and proposes a classification method based on the text content of the document image.Again,this paper extracts the rich structural features of document images and proposes a classification method based on the structural features of document images.Finally,this paper fuses the features of the three modalities of the document image mentioned above,which are two-by-two fusion,which requires the channel summation of the two modal features of the document image,and also three model features fusion,which uses the linear fusion method method to fuse the three optimal models mentioned above.The above methods are trained,validated and tested on both the RVL-CDIP and Tobacco-3482 datasets.Extensive experiments were conducted on these two datasets and compared with existing methods,and the methods in this paper achieved 93.1%classification accuracy on the RVL-CDIP dataset and 94.87 classification accuracy on the Tobacco-3482 dataset.
Keywords/Search Tags:Document image classification, Multimodal fusion, EfficientNets, Convolution neural network, Transfer learning
PDF Full Text Request
Related items