| With the rapid development of the field of artificial intelligence in recent years,it has brought great changes and progress in the fields of computer vision and natural language processing.Image classification,as a hot spot for research in computer vision,has achieved breakthrough research results in both academia and industry.Fine-grained image classification is based on image classification to further identify the small differences between different subcategories within the same category.With the development of artificial intelligence,there is an increasing demand for fine-grained subcategories under the same basic category,such as intelligent retail systems,ecosystem protection,climate change assessment,and smart transportation systems.However,finegrained image classification is a very challenging task due to the small inter-category differences and large intra-category differences reasons for fine-grained images.The thesis presents an improved fine-grained image classification network model with weakly supervised data enhancement in the framework of deep learning.The main research contents are described as follows:(1)The domestic and international research on fine-grained image classification algorithms is investigated and analyzed,and the basic principles and development history of various paradigms are analyzed.The thesis proposes an improved fine-grained image classification network model based on the Weakly Supervised Data Augmentation Network(WS-DAN)and improves its optimization algorithm and activation function.(2)To address the problem that the original backbone network relies on downsampling to obtain the attention map training cost is too high,we propose to use the CBAM attention module to enhance the extraction of the attention map to better learn the important features and suppress the information of unimportant features.This allows the network model in this thesis to focus on region-specific information of the image and thus complete the image classification more efficiently.(3)To address the problem that the network model is more limited for the size and perceptual field of the input image,we propose to introduce the Spatial Pyramid Pooling Fast(SPPF)module.It effectively avoids the problems of incomplete cropping and shape distortion of image objects due to image cropping and scaling operations,and increases the perceptual field.The structure acquires multi-scale target information,extracts more spatial and contextual information,and enhances the robustness and performance of the model.(4)To address the problem that it is difficult for the network model to capture the fine details and relationships between objects in the image,a global context block(GC Block)is proposed to be inserted in the backbone network to solve the problem.In order to help the network model to better localize the object parts and help the classification of fine-grained images to capture the details and relationships between objects,the model inserts a non-local block(Non-local block).At the same time,in order to solve the problem of high computational complexity,a GC block is introduced,which can better utilize the information of all previous layers and improve the expressiveness of the model while keeping the non-local block with long-distance dependency modeling capability without increasing too much computational complexity.In summary,the method in this thesis is based on a weakly supervised data-enhanced fine-grained image classification network,adding a CBAM attention module to enhance the extraction of attention maps from the network model,and then adding a SPPF module to obtain multi-scale target information and increase the perceptual field.The GC Block is inserted on the backbone network to help fine-grained image classification capture finegrained details and relationships between objects,reduce the number of operations and increase the global context modeling capability to complete the image classification more efficiently.Experiments on two publicly available datasets,the CUB-200-2011 dataset,and the FGVC-Aircraft dataset,achieved 89.81% and 93.97% accuracy rates respectively,proving the effectiveness of the method. |