| With the prevalence of social networks,as well as the rapid development of multimedia technology,images have become an important medium for people to express themselves and understand others in daily life.Nowadays,computer vision algorithms mostly teach networks how to “see” like a human while scarcely tell them how to “feel” like a human,where learning human emotions has been considered as a crucial step towards strong artificial intelligence.Aiming at understanding how people feel emotionally towards different visual information,image emotion analysis has come into being recently.Serving as an important topic in the computer vision field,progress in image emotion analysis will benefit related tasks(e.g.,image aesthetic assessment,stylized image captioning),and will have a great impact on other applications(e.g.,public monitoring,opinion mining,and psychological disease diagnosis and treatment).In recent years,benefitting from the powerful representation ability of deep learning models,the performance of image emotion analysis method has been significantly improved.However,most of the existing methods mainly focus on implementing a general network to predict emotions,neglecting the unique psychological prior information.Emotion is a high-level cognitive process of human beings.Therefore,the research of image emotion analysis is very challenging and thus needs interdisciplinary knowledge.In view of this,combining psychological studies and deep learning models,this thesis designs multiple networks to simulate human emotion cognition process,deeply studies the problem of image emotion considering three major challenges on abstractness,ambiguity,and subjectivity,and puts forward corresponding solutions.To sum up,the main research results in this thesis are listed as follows:1.A stimuli-aware image emotion classification method is proposed.The existing deep learning-based image emotion classification methods have already achieved superior performance.However,most of them directly extract general feature to predict emotions,without considering the emotion evocation process in human emotion recognition.Motivated by the Stimuli-Organism-Response model,this thesis proposes a stimuli-aware network to mimic the emotion evocation process,which aims to solve the challenge of abstractness in image emotion analysis.To be specific,color,object,and face are selected as typical emotional stimuli according to psychological studies,where three specific networks are further designed to extract emotional features from different stimuli.Besides,this thesis proposes a novel hierarchical cross-entropy loss,in order to distinguish hard false examples from easy ones in an emotion-specific manner.Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on public image emotion datasets.Visualization results have proved the importance of stimuli in emotion evocation process.2.A scene-object interrelated reasoning network is proposed for image emotion classification.Recent methods gradually turn their eyes from global to local,hoping to capture more emotional clues from finer local regions.However,whether global or local,mapping a single feature directly underestimates the abstractness of emotion.Inspired by psychological studies,this thesis proposes a scene-object interrelated visual emotion reasoning network(SOLVER),aiming to mine visual emotions from the interactions between objects and objects,as well as scenes and objects,which is proposed to tackle the abstractness in images emotion analysis.Specifically,the Emotion graph is first constructed based on semantic concepts and visual features,where Graph Convolutional Network(GCN)is then applied to conduct reasoning,yielding emotion-enhanced object features.To interrelate scenes and objects,this thesis proposes a scene-based attention mechanism by exploiting scene features to guide the object fusion process.Extensive experiments and comparisons are conducted on public image emotion datasets,and the results demonstrate that the proposed SOLVER consistently outperforms the state-of-the-art methods.Visualization results on emotional object concepts and emotion object regions further prove the effectiveness and interpretability of the proposed method.In addition,in further discussions,this paper extends the experiments to potential datasets in related fields.3.A circular-structured representation is proposed to learn image emotion distribution.Current works on image emotion analysis mainly focus on the single label classification task,yet it is more reasonable to depict image emotion through a label distribution.Therefore,researchers introduce Label Distribution Learning(LDL)to learn image emotion distribution.However,unlike other LDL tasks,there exist intrinsic relationships between distinct emotion labels.Based on Mikels wheel from psychology,this thesis proposes a well-grounded circular-structured representation to depict any emotion distribution,aiming at solving the ambiguity in image emotion analysis.This thesis first constructs an emotion circle to unify any emotional state within it,where each emotion distribution is mapped into an emotion vector with three attributes(i.e.,emotion polarity,emotion type,emotion intensity)as well as two properties(i.e.,similarity,additivity).Besides,a novel progressive circular(PC)loss is designed to constrain the emotion vector in a coarse-to-fine manner.The proposed method consistently outperforms the state-of-the-art methods on public image emotion distribution learning datasets and is further ablated on PC loss to validate its effectiveness.4.An image emotion distribution learning method seeking subjectivity is proposed.Existing methods often predict image emotion distribution in a unified network from a group perspective.However,emotions are subjective,where different people may have different emotions towards one image.In reality,given an affective image,each individual is asked to vote for one emotion independently,which is further summed up to form the final distribution.Motivated by the Object-Appraisal-Emotion model,a subjectivity appraise-and-match network(SAMNet)is proposed to simulate the crowd emotion voting process,aiming at dealing with the subjectivity in image emotion analysis.The proposed network contains two stages: subjectivity appraising and subjectivity matching.In subjectivity appraising,to preserve the unique emotional experience of each individual,this thesis constructs the affective memory with attention mechanism,where subjectivity loss is further proposed to guarantee the diversity between different individuals.The subjectivity matching is designed with a matching loss,aiming at assigning unordered emotion labels to ordered individual predictions,which is modeled in a bipartite matching paradigm and optimized with Hungarian algorithm.Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art methods on several evaluation metrics.Visualization results on affective memories of different individuals further prove the importance of investigating subjectivity in image emotion distribution learning.To sum up,combining psychological studies and deep learning models,this thesis proposed four models to address the problem of image emotion analysis.The first model predicts image emotions from multiple emotional stimuli.Based on this,the second model interrelates and reasons between scenes and objects,which interconnect isolated features together.The first two models study the problem of image emotion classification and meet the challenge of abstractness.The third model builds circular-structured representation to better depict emotion distribution,which aims at solving the ambiguity.The fourth model mimics the crowd voting process to learn image emotion distribution in a subjective manner,mainly addressing the challenge of subjectivity.The task of the latter models is image emotion distribution learning.In summary,our four models are from isolation to association,from classification to distribution,and from commonness to individuality. |