Document Image Classification Based On Multimodal Feature Fusion

Posted on:2024-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:F J Wu

Full Text:PDF

GTID:2568307100989469

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Document image classification plays a very important role in document image research and application tasks,the images contain three different modalities: image,text and structure,and all three modalities play an important role in characterizing image category information.Since traditional methods use a single modality to classify document images and fail to utilize all features of document images as much as possible,this paper adopts three modality fusion methods to classify document images in order to improve the classification effect of document images.First,the visual features of the document image are deeply mined.In the first step,the overall features of the document image are extracted and classified using the Efficient Nets network;in the second step,the features of each region of the document image are extracted using the Efficient Nets network,and the extracted features are channel fused.Secondly,this paper extracts the text features of the document image and proposes a classification method based on the text content of the document image.Again,this paper extracts the rich structural features of document images and proposes a classification method based on the structural features of document images.Finally,this paper fuses the features of the three modalities of the document image mentioned above,which are two-by-two fusion,which requires the channel summation of the two modal features of the document image,and also three model features fusion,which uses the linear fusion method method to fuse the three optimal models mentioned above.The above methods are trained,validated and tested on both the RVL-CDIP and Tobacco-3482 datasets.Extensive experiments were conducted on these two datasets and compared with existing methods,and the methods in this paper achieved 93.1%classification accuracy on the RVL-CDIP dataset and 94.87 classification accuracy on the Tobacco-3482 dataset.

Keywords/Search Tags:

Document image classification, Multimodal fusion, EfficientNets, Convolution neural network, Transfer learning

PDF Full Text Request

Related items

1	Breast Ultrasound Image Classification On Deep Feature Based Transfer Learning And Feature Fusion
2	Research On Image Classification Based On Transfer Learning And Deep Convolution Networks
3	SAR Target Recognition Based On Convolution Neural Network And Migration Learning
4	Research On Fine-grained Image Classification Algorithm Based On Multi Convolution Neural Network Fusion
5	The Research Of Image Classification Methods Based On Convolution Neural Network
6	Research On Key Problems Of Transfer Learning In Deep Neural Networks
7	Research On Image Classification Algorithms Based On Convolutional Neural Network
8	Research On The Classification Algorithm Of Solar Radio Spectrum Based On Convolution Neural Network
9	Research On Image Automatic Annotation Model Based On Improved Convolution Neural Network
10	Medical Image Classification Based On Inception Module