| Fine-grained image classification further subdivides traditional image classification into subcategories,i.e.,an effective division of sub-categories based on the differentiation of supercategory objects.This allows the computer to effectively classify target objects at an "expert" level.Fine-grained image classification technology has high research value in various professional application fields such as automatic monitoring of biodiversity and intelligent agriculture.Differs from traditional image classification in that fine-grained image classification is characterised by low intra-class similarity and high inter-class similarity.In fine-grained images,images within the same class have may low similarity from different perspectives,requiring the identification of local common areas;meanwhile,images from different categories but the same perspective may have high similarity,necessitating the identification of local difference regions.This dissertation focuses on exploring the commonality of intra-class images and the difference between inter-class images to improve classification performance.The main research work of this thesis are as follows:(1)The Channel Interaction Attention Networks(CIA-Net)is proposed for mining critical regions of commonality in intra-class images.Most existing models typically take a single image as input and learn discriminative regions through fine-grained feature learning.However,these approach overlooks the potential complementary information that other images from the same subclass can provide to assist in classification.Therefore,CIA-Net takes image pairs of the same class as input and proposes a channel interaction structure to extract channel correlations between image pairs,allowing the model to learn the same discriminative regions between two images.Then,attention enhancement and suppression modules are proposed,where attention enhancement is used to enhance the classification ability of interactive feature information;meanwhile,attention suppression is used to eliminate common discriminative regions to obtain suppressed images.These images are then re-input to the model for training,allowing for the extraction of contextual information outside the common region.Finally,the suppressed images are re-input into the model for training,leading the model to extract contextual information outside the common region.Experiments on four datasets show that CIA-Net can effectively mine the commonality of intra-class images and improve fine-grained image classification accuracy.(2)The Multiple Attention Pyramid Networks(MAP-Net)is proposed for mining the differences in similar regions of between inter-class images.Most existing algorithms primarily extract information from high-level features,neglecting low-level detailed features.This results in locally targeted areas with the same semantics and a lack of detailed differences,ultimately impacting model classification performance.To address this limitation,MAP-Net proposes bidirectional feature path modules for enhancing the bi-directional association of detailed information of low-level features with semantic information of high-level features.Then a mixed attention mechanism is introduced after the top-down feature path to generate an attentional feature pyramid that captures the important features at each scale.Finally the model performs classification prediction by fusing multi-level attention features.Experiments show that MAP-Net is a high-performance fine-grained image classification algorithm. |