| Fine-grained image classification is an important research direction in the field of computer vision.Its purpose is to accurately classify objects with similar appearance but belonging to different categories by deeply mining and analyzing the detailed features of objects in the image.This field has broad applications in many practical applications,such as medical image classification,face recognition,animal recognition,plant classification,etc.However,since the fine-grained image classification task faces difficulties such as small dataset size,high similarity between categories,and difficult feature extraction,its classification accuracy and practicability still need to be improved.To this end,this paper optimizes the low recognition accuracy,model limitations,few label samples,and disordered semantic features that commonly exist in fine-grained classification.The main contributions of this paper are as follows:(1)Aiming at the problem that the generalization ability of the model is not strong and the accurate recognition is not easy in the fully supervised fine-grained image classification,a generalized attention framework is designed in this paper.First,the generalized single model makes full use of the feature complementarity between the original image and the transformed image to enhance the attention to the significantly different regions.Secondly,a dual-model generalization structure is adopted to further extract discriminative regions that are easily overlooked in the image through the comparison and complementarity between the two models,thereby improving the classification accuracy of the model.Finally,an adaptive weighting method is employed to balance the cross-entropy loss and generalized attention loss to improve the model classification performance.(2)Aiming at the problem that the semi-supervised fine-grained image classification model has few label samples and disordered semantic features,this paper designs a semi-supervised fine-grained image classification model based on semantic representation learning and interactive learning.First,to solve the problem of small number of labels,a selective semantic enhancement method based on semantic representation learning is proposed.This method utilizes the feature matrix to select semantic regions,thus generating new samples to retrain the model.Second,a matrix-level interactive learning method is also provided to make full use of unlabeled samples.The label features and unlabeled transformation features are constrained by the loss function,so that the prediction results of each model tend to be consistent.Finally,in order to solve the disorder problem of fine-grained semantic features,a semantic feature rearrangement method based on semantic representation learning is proposed.By end-to-end learning of semantic features,small semantics in fine-grained images are efficiently combined into ordered semantic features,specifically,feature channels with similar attributes are arranged together to represent small semantics of input images.Semantic feature rearrangement can improve the model’s ability to identify various semantic regions,thereby effectively improving the accuracy of fine-grained image classification tasks.(3)The method of this paper is evaluated on the existing mainstream fine-grained datasets.The dual-model structure based on the generalized attention mechanism reached 88.4%,90.5% and93.4%,an increase of 0.7%,2.2% and 0.4% compared with the mainstream API method.The semi-supervised fine-grained classification model based on semantic representation and interactive learning has a classification accuracy of 77.63% when there are only 2000 labeled samples in the CUB-200-2011 dataset.The above experiments demonstrate that the proposed method performs superiorly in fine-grained image classification tasks. |