Font Size: a A A

Fine-grained Visual Classification Via Weakly Supervised Information

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306476950169Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The recognition of sub-ordinate categories in a same class,which is known as Fine-Grained Visual Classification(FGVC),is a challenging computer vision problem.Most convolutional neural networks(CNNs)cannot distinguish diverse categories in the FGVC datasets explicitly since the inter-category differences are more subtle compared with the conspicuous intra-category variations.Strongly-supervised fine-grained recognition has made impressive progress by localizing discriminative regions in target images and extracting corresponding convolutional features for classification.However,it is extremely expensive to acquire the human-defined annotations,which makes strongly-supervised methods impractical for large-scale fine-grained recognition tasks.Alternatively,some methods establish classification models based on weakly-supervised information,and only require image-level labels in the datasets instead of extra bounding boxes or part annotations.These classification models have attracted considerable attention from both industry and academia due to their satisfactory accuracy.Therefore,we conduct our research on weakly-supervised methods.Given the inherent defects of existing approaches,three novel models are proposed in this paper.More concretely,the contributions of the paper are summarized as follows:1.Although Bilinear CNN perfectly captures the slight discrepancies in local regions and markedly enhances the accuracy of the FGVC tasks,its further improvement suffers from three essential limitations.First,Bilinear CNN extracting the top-scale activations from two parallel streams simply for prediction damages local details.Second,under-utilization of the first-order aggregation in bilinear pooling may contribute to inadequate feature representation.Third,the validity of feature information from two structurally similar networks remains to be discussed.Hence,we provide a novel architecture,which is referred as Hybrid-order and Multi-stream Convolutional Neural Network(HM-CNN),to address these problems.Firstly,based on the bottom-up pyramidal structure in the deep CNNs,we reuse multi-scale features from different layers in one basic network and integrate them by addition during forward propagation.Then we apply hybrid-order pooling by merging the first-order statistics rather than original feature matrices with bilinear vectors.Finally,A cross multi-stream framework built on three basic networks fully utilizes the diversity of feature extractors.The complementary and reinforced information from structurally different convolutional neural networks can greatly boost the accuracy and robustness of the classifier.Experimental results demonstrate HM-CNN significantly improves the accuracy on the CUB-200-2011,FGVC Aircraft and Stanford Cars datasets and achieves a new state-of-the-art performance.2.Different from global average pooling or fully connected layer,Bilinear CNN employs a translation invariant fashion to pool the second-order statistics information across the spatial locations.The procedure excavates better expression of target images while the dimension of ultimate features upsurges sharply.Thus we propose a lightweight convolutional neural network with cross-layer feature interaction(LW-CNN)for model compression and acceleration.In this network,a new residual module is implemented with group convolution and hierarchical layer aggregation.It approaches the representational power of dense conventional convolution,but at a considerably lower computational complexity by constructing hierarchical residual-like connections within one single residual block.This module can be plugged into the deep residual network straightforwardly.Then,we design an efficient low-rank polynomial kernel pooling scheme,which is originated from tensor decomposition,to obtain a compact feature description with the same discriminative power as the full bilinear representation but using only a few thousand dimensions.Additionally,a cross-layer feature interaction framework is utilized to integrate the multiple inter-layers part feature correlations,which results in superior performance compared with other one single convolution layer based approaches.Experimental results show that LW-CNN has elegant compromise of accuracy with lower theoretical complexity and smaller model sizes.3.Existing weakly-supervised methods predominantly neglect the relations between discriminative region localization and fine-grained feature learning,severely restricting the further improvement.Motivated by this problem,we introduce a recurrent convolutional neural network based on self-attention mechanism(SA-RCNN)for fine-grained recognition,which consists of student-model,teacher-model and classification model.SA-RCNN combines region proposal network with Teacher-Student feedback mechanism to automatically capture region attention and learn feature representation merely with image-level labels.Furthermore,we employ hard parameter sharing during multi-task joint learning to avoid overfitting.Additionally,dynamic weight average(DWA)mechanism adaptively adjusts the weight coefficient of each task by calculating the relative descending rate of corresponding loss function to make the assignment more reasonable.Experimental results demonstrate that SA-RCNN learns discriminative region attention and fine-grained feature representation in a mutually reinforced manner,and achieves the prominent classification accuracy on the CUB-200-2011,FGVC Aircraft,Stanford Cars datasets.Besides,the entire network is trained end-to-end,with the parameters of three models learned simultaneously,which exceedingly lightens the burden of designers.
Keywords/Search Tags:fine-grained visual classification, weakly-supervised information, cross multi-stream framework, lightweight convolutional neural network, self-attention mechanism
PDF Full Text Request
Related items