| With the development of the art curation field in China,the artwork resources in curatorial repositories are gradually increasing.The art collection system needs to update the curatorial library frequently,and the traditional classification and archiving rely on manual implementation,which consumes much time.Automatic classification of artworks through artificial intelligence technology can save the time and effort of manual screening and archive and ensure the timely update of the curatorial library.The data modalities for artwork classification are mainly text and image.In order to consider the advantages of all modal features,this paper proposes an artwork classification method based on multimodal fusion.We propose a text classification method for artwork based on the self-attentive mechanism.We construct a visual Transformer-based artwork image classification model,and finally,we use decision-level fusion to achieve multimodal fusion artwork classification based on the artwork classification models of different modalities.The results show that the method can achieve more accurate artwork classification and validate the effectiveness of the proposed multimodal fusion-based artwork classification method by experimentally comparing the improved unimodal classification method before and after and multiple fusion methods using predefined decision rules.The main work of the paper is as follows.(1)A modelling method combining local convolution and global attention is proposed for the data of text modality to improve the classification ability of artwork text.The interpretive text data of artwork samples are too long and contain too much information,and the text convolutional model(Text CNN)modelling process lacks attention to crucial information,resulting in poor classification accuracy.This paper adds attention to calculation after the word embedding layer in response to this problem.It performs self-attention analysis for each group of feature maps separately to give greater weights to crucial features and reduce the interference of non-key features on the classification results to enhance semantic information to improve the classification accuracy of artworks.(2)For the data of image modality,a Transformer framework-based artwork image classification method is proposed to combine texture information to improve the accuracy of artwork classification of image data.The visual Transformer model(Vi T)image serialization process loses the texture information between adjacent image blocks.In response to this problem,this paper changes the sliding step size to half of the image size during the serialization of the original image segmented into multiple image blocks to preserve the texture information at the neighbouring positions of the image blocks,and the experimental results show that this method can speed up the convergence of the model and achieve better image classification results.(3)For multimodal data,the design of learnable weights is proposed to fuse the pre-classification results of two modal classifiers to improve the artwork classification accuracy.The information contained in text and image data plays a mutually reinforcing role in improving the accuracy of artwork classification.In this paper,we design a decision-level fusion network based on multimodal fusion by assigning learnable weights to text and image classifiers to improve the accuracy of artwork category prediction.By comparing the improved unimodal classification methods before and after and various fusion methods using predefined decision rules,experiments on the publicly available artwork dataset Sem Art show that the decision-level multimodal fusion method proposed in this paper can be effectively used in artwork classification research. |