| The Fine-Grained Visual Categorization(FGVC)task is to classify sub-classes under large categories,which is finer than the granularity of ordinary image classification task.Vegetable and fruit recognition can be regarded as a fine-grained visual categorization task.Due to the large intra-class variance and small inter-class variance,and the influence of occlusion,background,perspective and other factors,it is very challenging.The mainstream direction to solve this challenge is to use fine-grained local or global features to enhance feature extraction and representation in the learning process.However,unlike human visual system,most existing FGVC methods only extract features from a single image during training.By contrast,people can learn to identify features better by comparing two different images.The traditional Attentive Pairwise Interaction Network(API-Net)takes the image pair as the input of pairwise feature interaction,and shows excellent performance in several open FGVC datasets.However,on the FGVC dataset VegFru in the field of vegetables and fruits,the accuracy of API-Net is lower than expected.Through the analysis of the fine-grained problem,this paper proposes a FGVC framework based on attention-aware interactive feature network(AIF-Net),designs a complete network structure,and conducts training and testing experiments on several specific vegetable data sets.Finally,the actual data images are used for experimental verification.The innovation of fine-grained fruit and vegetable classification interactive feature learning artificial intelligence algorithm based on attention perception(AIF-Net)proposed in this paper lies in:(1)A new fine-grained fruit and vegetable classification algorithm based on attention mechanism is proposed.The algorithm reduces the influence of a large number of non-related information such as multi-scale and complex background information into the training process of fine-grained images,focuses on key information locations,produces more discriminative feature representation,and effectively enhances the ability of the network to extract the main features of the target.(2)The regional suggestion network is integrated to form a new network structure.The local and global features with large amount of information are fused to generate high-quality fine-grained features by interactive feature learning.(3)The API-Net loss function is optimized,and the underlying neural network is better optimized by adaptively considering the feature priority to learn each image in the image pair and local area.The experimental results show that in the three datasets of Veg200,Fru92 and VegFru292,AIF-Net takes three local regions from the regional recommendation network,and performs serial fusion with the global features.Finally,the Top-1 accuracy is the highest,which is 89.087%,91.032%and 90.756%,respectively.On the VegFru292 dataset,Top1 and Top5 performed the best accuracy,with the accuracy of 3.515%and 0.771%higher than that of API-Net,respectively.The AIF-Net proposed in this paper performs interactive feature learning by combining global and local attention feature maps,and has good discrimination ability. |