| Fine-grained image recognition task is one of the popular research directions in the field of image recognition.Unlike general image recognition tasks,fine-grained image recognition aims to identify the sub-categories of the same class with their subtle differences,this requires the model to distinguish the detailed differences among various sub-categories.Therefore,the this thesis first identify discriminative regions,compute features from these regions,and then classify the categories based on these features,the main work is as follows:(1)According to the characteristics of fine-grained images,this thesis proposes a multiple part localization and generation module based on attention mechanism for finding discriminative regions of fine-grained images.The multiple parts include attention parts and object parts.The core idea of generating attention parts is to use the attention mechanism to locate the position of a single part and then crop it into a corresponding part.Subsequently,by masking the already found parts in the original image,the search for locations of other parts is repeated according to the above method until the fixed number of parts are cropped.During this process,the attention mechanism is used to ensure the accuracy of locating the parts,the way of masking the cropped parts in the original image and then relocating the next position ensures the diversity among the generated parts,the attention parts and the object parts contain the detail and the global information of the image respectively.(2)This thesis proposes a supervised contrastive learning module based on multiple parts,which aims to train encoder for feature computation on local regions.The module changes the data augmentation,the sample pair division way,and the supervised contrastive loss function in supervised contrastive learning framework.For data augmentation,the original image and the parts are augmented separately while the random cropping operation in data augmentation is no longer used.For the sample pair division way,the part is used as the anchor sample,the original images with the same class label as the anchor sample and the parts with the same class label and from the same part detector are used as positive samples,and the remaining samples in the minibatch are used as negative samples.As the division of the samples is changed,the loss function is also changed,where loss function calculates the distance between the part feature and its positive sample features,and the distance between the part feature and other sample features,finally the final loss is obtained by summing up the loss calculated for different part from the same original image.By minimizing this loss makes the anchor sample and its positive samples get closer in feature space,and on the contrary,other features get farther away,which encourage the encoder network to extract more discriminative information.(3)Using the classification module to classify the images,the specific way is as follows:combining the encoder with the classification layer to form a classification network.In this network,multiple parts are input into the encoder to calculate the features of each part,and these features are then concatenated.Subsequently,the concatenated features are input into the classification layer for classification,resulting in the final classification results.In this thesis,experiments are conducted on three widely used fine-grained image datasets,i.e.,CUB-200-2011,FGVC-Aircraft and Stanford Cars.The results demonstrate that the proposed fine-grained image recognition method outperforms previous traditional supervised methods,self-supervised contrastive learning methods,and supervised contrastive learning methods on these datasets,this confirms the effectiveness of the proposed recognition method.Additionally,the experiments provide evidence for the effectiveness of the multiple part localization and generation module based on attention mechanism as well as the supervised contrastive learning module based on multiple parts in addressing fine-grained image recognition tasks. |