| Retail shopping is an important part of people’s lives.In recent years,traditional brick-and-mortar retailers have been facing increasing costs and competition from online shopping,and thus urgently need to transform and upgrade through intelligent technology.Currently,there are two main ways to transform the retail scene: one is to use RFID(Radio Frequency Identification)technology for automatic tracking and identification of goods,which can improve the efficiency and accuracy of physical retail to some extent.However,its drawbacks are also obvious: firstly,RFID(Radio Frequency Identification)requires a relatively high cost,such as the need to attach RFID tags to each item,which is not practical for retail scenarios with low unit prices;secondly,RFID is also limited by radio communication distance,signal interference,privacy issues,and tag durability.The other method is to use deep learning techniques to achieve product detection in retail scenarios,which can help merchants identify products for checkout,track inventory,detect product display and replenishment,and analyze consumer behavior such as browsing duration.However,the current deep learning-based methods also face significant challenges in retail scenarios.For example,severe occlusion between similar products or dense arrangement of products can make detection difficult.In this article,we conducted research in two areas to address the existing deficiencies and shortcomings of deep learning methods in retail scenarios.(1)To address the issue of severe occlusion between similar products,this article proposes a new mixed-attention mechanism integrated into the feature extraction network to enhance the model’s ability to detect features of occluded products.Additionally,a counting branch is designed to generate the number of products within the bounding box.By detecting a group of products within a bounding box and providing the count of products,this method effectively improves the feasibility of deep learning-based detection of occluded products.Overall,this approach enhances the effectiveness of deep learning methods in identifying severely occluded products in retail scenarios.(2)In physical retail settings,shelf space is limited and products are densely placed,which presents a challenge for existing detection models as there is no occlusion between similar products.This article selects Swin-transformer as the feature extraction network for detecting products in densely packed scenes and extracting effective features.Compared to convolutional neural networks,Swin-transformer can dynamically evaluate the importance of input data and extract more effective features.Additionally,this article designs a feature fusion-enhanced pyramid to fully fuse features of different scales and enhance them.Compared to traditional FPN(Feature Pyramid Networks)structures,this method can focus more on information from non-adjacent layers,enhance the receptive field in target detection,and obtain richer feature information.The proposed approach achieved good results on the SKU-110 k dataset. |