| Compared with the ordinary image classification task,the finegrained image classification task aims to identify the subclasses of a given target category.Because different subclasses of ordinary visual categories are only different in the nuances of discrimination,and the images of the same subclass are affected by factors such as illumination and shooting angle,there are great differences between individuals.Therefore,the task of fine-grained image classification faces the great challenge of large intra class variance and small inter class variance.Since the era of deep learning,convolutional neural network has made important progress in the basic tasks of computer vision such as image classification,target detection and image segmentation,and has also achieved unprecedented success in real-world applications,including finegrained image classification tasks applied in biodiversity monitoring,intelligent retail,intelligent transportation and other fields.For image classification tasks,convolutional neural networks usually minimize the cross entropy loss between the real category label and the network prediction value as the optimization goal.However,the loss of cross entropy often makes the network pay attention to the most discriminative region in the image and ignore other less significant but complementary parts,which can not meet the requirements of fine-grained image classification task.However,when mining multiple discriminative parts in an image,the existing methods may face problems such as introducing noise from the background or no consistency of targets in different images.These problems limit the improvement of the performance of fine-grained image classification methods.Facing the task of fine-grained image classification,this paper makes an in-depth exploration on the knowledge distillation method,especially studies how to locate multiple subtle and discriminative parts in a compact network,find complementary information at the spatial level,and improve the fine-grained classification performance of convolutional neural network.Firstly,based on the idea of feature-based knowledge distillation,this paper designs a distillation loss function for fine-grained image classification task:orthogonal loss function.The loss function can promote the student network to find diversified and discriminative semantic parts.The first convolutional neural network is trained by minimizing the loss of cross entropy.The network only focuses on the most discriminative part of the image.Take the pre-trained network as the teacher network,train multiple student networks in turn,extract the spatial attention of the student network and the teacher network respectively,and use the orthogonal loss function to guide the current student network training,so as to make the student network not pay attention to the area where the teacher network has been located,but mine other parts containing information,It provides supplementary information for the final fine-grained image classification.Finally,integrate the networks trained in each stage to obtain the final category decision.Then,using the idea of response based knowledge distillation for reference,this paper designs a convolution neural network architecture with multiple auxiliary classifiers.KL divergence is used as the distillation loss function.Different teacher networks provide information flow for different auxiliary classifiers of student network,and urge each auxiliary classifier of student network to observe different predictions from data,Imitate the output response of each teacher network,capture effective complementary information in the teacher information flow,and obtain better generalization ability.In this paper,a large number of experiments have been carried out on three common fine-grained data sets and two common backbone networks.The proposed method has achieved competitive performance compared with the cutting-edge methods. |