| Images are the main medium of information transfer in social media,and users can convey emotions through images.Using image sentiment classification methods to automatically predict users’ sentiment has important application needs in tasks such as robot emotional interaction and multimedia analysis.Emotion is an abstract and subjective semantic information,and there is a complex nonlinear relationship between image features and image emotion,so image emotion classification is a challenging task.The sentiment embedded in an image is a comprehensive reflection of the global features of the image,and the image sentiment has polarity characteristics,and different fine-grained sentiments will show the same sentiment polarity,but the existing work fails to effectively use the above features for image sentiment analysis.In order to analyze the sentiment within an image more accurately,this paper firstly investigates the image polarity sentiment classification task and extends the task to image finegrained sentiment classification based on this task,and the specific research work is as follows:(1)Polarity-aware Attention Network for Image Sentiment Analysis: The psychological finding demonstrates that the emotional content is ordinarily involved in some informative regions.In fact,these informative regions of visual images also convey different emotional polarities and intensities.The emotion conveyed by the whole image can be regarded as the combined effect of the positive-polarity emotional regions and the negative-polarity emotional regions.Motivated by this psychological prior knowledge,we propose a new polarity-aware attention network for image sentiment analysis in an end-to-end manner.Specifically,the proposed network is composed of a sentimental feature extraction backbone,a polarity-aware attention module and a fused classification module.The backbone is used to extract the global contextual features.The polarity-aware attention module not only attends to positive and negative emotion regions by predicting polarity-aware attention maps,but also estimates their polarity intensities.The fused classification module integrates the output of the first two modules for the final image sentiment prediction.A compound loss function is designed to guide network learning using the weakly-supervised manner and the distributed label smooth learning method.We validate our method on multiple benchmarks and the experimental results demonstrate that our method can obtain superior performance over several state-of-the-art methods.(2)Combined Polarity Detection and Transformer for Image Sentiment Classification Network: The sentiment embedded in an image is a comprehensive reflection of the global features of the image,and the image sentiment possesses polarity characteristics,and different fine-grained sentiments will exhibit the same sentiment polarity,but the existing work fails to effectively utilize the above characteristics for image sentiment analysis.In this paper,we propose a combined polarity detection and transformer for image sentiment classification network,which incorporates Transformer-based fine-grained sentiment classification and polarity detection into the same network for end-to-end learning,and uses the correlation between the two tasks for image sentiment classification.The network consists of a feature extraction backbone,a spatial self-attention module and a polarity detection branch.The backbone is used to extract local contextual features and the spatial self-attention module learns correlations between image blocks to capture the non-local correlations of image features for image sentiment prediction.The polarity detection branch detects the coarse-grained sentiment of the current image and is used to guide fine-grained sentiment classification.Experiments on several benchmark datasets show that the network outperforms current mainstream methods,and ablation experiments validate the effectiveness of the modules. |