| Multi-label image classification is a fundamental task in computer vision.Considering that it is very suitable for real visual application scenarios and similar to people’s basic cognitive habits,the multi-label image classification task has been widely used in real life and rapidly developed in scientific research theory.The rich semantic information and complex label dependencies of multi-label images provide strong information support for multi-label image classification,but also bring many challenges.First,the images to be processed usually contain multiple objects of different sizes,shapes,perspectives,and layouts,which contain more complex semantic features than single-label images.Second,since objects often appear simultaneously in images,modeling of label dependencies can effectively characterize the relationship between different objects,thereby improving classification performance.Finally,exploring the robustness of the model to noisy data has become an urgent problem to be solved in the research of multilabel image classification.Aiming at the problems of label relationship mining and label noise removal in multi-label image classification tasks,our work combines the advantages of graph representation learning and visual semantic embedding mechanism,effectively integrates and enriches external prior knowledge,and mines deep dependencies between labels from different perspectives and dimensions relation.The main research contributions are as follows:(1)Deep Multi-label Image Classification Algorithm based on Visual Semantic Graph EmbeddingThe core challenge of multi-label image classification lies in capturing the spatial or temporal dependencies between labels.Aiming at this important dependency,this work focus on combining the graph representation with the visual-semantic embedding mechanism to capture the co-occurrence relationship between labels through a graph convolutional network,so as to learn more appropriate label representations.At the same time,different from previous methods of learning visual representations of images,the proposed model regards images as a guide to the label ranking and uses images to generate direction guidance to find the correlation between images and labels.At the optimization level,to improve the efficiency of tag sorting,a new adaptive weighted label ranking loss is proposed,which reduces the amount of computation by avoiding the label pair sampling process.Meanwhile,this adaptive weighted label ranking strategy pays more attention to the relative ordering relationship of labels between classes to ensure the model classification accuracy.Various experiments and comparative results on different multi-label image classification datasets demonstrate the effectiveness of the proposed multi-label image classification algorithm.(2)Deep Multi-Label Image Classification based on Heterogeneous Prior KnowledgeTo capture label dependencies,word embeddings are usually chosen as the initial feature representation of labels.However,the traditional word embeddings are obtained based on a large amount of textual prior knowledge in the semantic space,and the direct application to image obviously ignores the difference of semantic information between different image data.At the same time,a large amount of redundant semantic information in the semantic space is also introduced into the model,which affects the model classification accuracy.To solve this problem,this work focus on using the heterogeneity of different prior knowledge to alleviate the negative impact of redundant information.Specifically,by exploiting the global and local label dependencies that exist in different spaces,the model first constructs visual prototypes and semantic prototypes for each label as prior representations.Then two graph convolutional networks are utilized to model the global and local label dependencies.In this case,visual and semantic classifiers are built seamlessly.In addition,in order to alleviate the negative impact of external redundant information,consistency constraints between visual space and semantic space are added to exploit data heterogeneity to improve the model classification performance.Extensive experiments and ablation study on three public multi-label image datasets validate the effectiveness of the proposed method. |