| As one of the most concerned areas in the current computer field,image processing has been widely studied in both academia and industry.As a branch of image processing,image recognition technology involves many fields.Fine-grained image recognition focuses on the recognition task between different sub-categories under the same large category.Compared with the fine-grained image recognition model based on strong supervision,the fine-grained image recognition model based on weakly supervised learning requires fewer annotations and is relatively simple,so it is widely used.To improve the recognition accuracy of the model for fine-grained image datasets,three algorithms based on weakly supervised learning were proposed to study their recognition effects in the field of fine-grained image recognition.The specific work is as follows:(1)Aiming at the problems of high feature representation dimension,weak interaction between levels and inability to simultaneously learn channel and spatial attention features in the second-order pooling algorithm,a fine-grained image recognition algorithm based on adaptive trilinear pooling is proposed,which uses convolutional neural network as the backbone network.The second-order covariance channel attention mechanism pyramid module,the adaptive feature mask module and the adaptive trilinear pooling module were combined to construct the pyramid structure to learn the channel attention vector of three-layer features,the aggregated spatial attention feature expression and the three-layer feature pooling fusion expression,respectively.Compared with other second-order pooling,the dimension of the pooling layer was lower in the adaptive trilinear pooling.Finally,the classification accuracy of 89.3%,94.2%and 91.8%is achieved on three fine-grained benchmark datasets CUB200-2011,Stanford Cars and FGVC-Aircraft,respectively.The best classification effect compared with similar high-order models.(2)The current Vision Transformer(ViT)-based feature selection method for finegrained image recognition algorithms has the problems of fixed number of selected features,high computational complexity,and poor ability to locate key areas.A weakly supervised fine-grained image recognition algorithm based on end-to-end stepwise feature selection is proposed.In this algorithm,ViT is used as the backbone network,and the feature blocks corresponding to the key regions are selected by the stepwise feature selection module in the forward propagation calculation,and these key blocks are used as the input of the next layer.The proposed algorithm can adaptively select in the calculation process of each layer according to the characteristics of each input image,without being limited by the inherent number,and is simpler,less computational complexity,and more accurate than ViT.Finally,CUB-200-2011,Stanford the classification accuracy of 91.2%,95.5%,94.5%,45.3%and 55.77%are achieved on five fine-grained benchmark datasets of Cars,FGVC-Aircraft,SoycultivarLocal and CottonCultivar80,respectively.It achieves the best classification results of the same kind of ViT-Base,which shows that the proposed algorithm is effective.(3)The adaptive trilinear pooling algorithm solves the problem of poor output feature expression ability from the perspective of output features;The stepwise end-toend feature selection algorithm enables ViT to select key regions in the end-to-end recognition process and reduces the amount of computation required.However,in further global feature expression,it only uses a single classification head to represent the information of the whole image,which has the problem of insufficient feature expression.In view of this,the adaptive trilinear pooling algorithm is combined with the stepwise feature selection algorithm after making appropriate adjustments to enhance the representation ability.Finally,it achieves 91.5%,95.9%,95.0%,45.41%and 56.21%classification accuracy on the datasets of CUB 200-2011,Stanford Cars,FGVC-Aircraft,SoycultivarLocal and CottonCult ivar80,respectively.Compared with the end-to-end stepwise feature selection algorithm without adding adaptive trilinear pooling,the proposed algorithm improves the performance by 0.3%,0.4%,0.5%,0.11%and 0.44%,respectively,which proves that the introduction of adaptive trilinear pooling is effective for ViT. |