Font Size: a A A

Research On Multi-Label Image Classification Algorithm Based On Graph Convolution Network

Posted on:2023-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:P P KangFull Text:PDF
GTID:2558307073991179Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Multi-label image classification is used in various fields of computer vision for the purpose of predicting the existence of one or more different object categories in an image.Aiming at the problem that the traditional multi-label image classification model is difficult to generate highlevel image features that are closer to the relevant labels,and the visual correlation between labels is not used,resulting in insufficient recognition accuracy.This thesis starts with the GCN(Graph Convolutional Networks)acquiring co-occurrence relation,the use of attention mechanism and the working mode of classification module respectively.In this thesis,a multilabel image classification algorithm based on spatial attention and graph convolution and a multi-label image classification algorithm based on FPN(Feature Pyramid Networks)Feature extraction are constructed.Includes the following work:Using Efficient Net to extract basic semantic information,graph convolutional network realizes label co-occurrence capture,and proposes a multi-label image classification algorithm based on spatial attention and graph convolution.The algorithm first uses GCN to learn the features of labels adjacency graph,then uses GLOVE algorithm to obtain labels embedding,and introduces an improved spatial attention network into the high-level semantic information to recalculate the semantic features of specific categories,thus realizing the suppression of background and distracting information.Then,the high-level semantic information and the label co-occurrence features extracted by the GCN are integrated in the classifier based on cooccurrence feature fusion,and the final prediction of the model is completed in a one-to-one channel.In view of the shortcomings of graph convolutional network algorithm in multi-scale feature extraction,based on the network structure of Resnet50,pyramid convolution is introduced into each convolution block to construct FPN as the basic feature extraction network,and spatial attention is constructed from the perspective of space and channel.The double attention module fused with spatial and channel attention realizes multi-dimensional attention feature extraction,and adopts Asymmetric loss to balance the negative impact of uneven attribute distribution in large-scale datasets on the training results of multi-label image recognition networks.A multi-label image classification algorithm based on FPN feature extraction is proposed.The comparison and ablation experiments were performed based on public data sets COCO and VOC-2007,and the experimental results and network feature were visualized.Experimental results show that the proposed algorithm is superior to traditional multi-label image classification algorithms in both algorithm complexity and average accuracy.The pedestrian attribute data set RAP is selected to verify the effectiveness of graph convolutional network for attribute co-occurrence feature extraction of task data set and the feasibility of this algorithm on pedestrian attribute data set.The experimental results show that building a suitable attention mechanism and loss function can effectively improve the average accuracy of multiclassification,and an excellent basic feature extraction network can greatly reduce the amount of model parameters and training costs.Finally,based on Python and Flask framework,a simple visualization system is designed and implemented,which captures the scene and local video in real time with the camera,and shows the detection effect of the algorithm in this thesis.
Keywords/Search Tags:Graph Convolutional Networks, Multiscale Features, Attention Mechanism, Feature Fusion, Multi-label Image Classification
PDF Full Text Request
Related items