| Retail dense scenes are images of supermarket shelves,these images often have high resolution and contain many retail items.Precise object detection in such dense scenes is still full of challenges.Object detection models that have achieved excellent results in other fields are also difficult to transplant to the retail scene without making improvements.Therefore,our paper focuses on improving the performance of Faster R-CNN in retail dense scenes.Our contributions are as follows:(1)To reduce the waste of ground-truth boxes,we propose a multiple-step sampling method.The effect of our method is to improve the utilization of label information and balance positive and negative samples.The main innovations include:improving the matching rules of anchor boxes and labeling boxes,proposing dynamic mining methods to increase the number of positive samples,and improving the utilization of labeling boxes by adopting multiple-step sampling.Experiments on the SKU-110k benchmark prove that our approach improves the AP of the Faster R-CNN model from 51.4%to 55.0%without decreasing the inference speed.(2)For multi-scale detection,we propose a novel local feature fusion R-CNN.Compared to the feature pyramid network,our design improves the detection accuracy of Faster R-CNN while simplifying the structure of the feature pyramid network.It uses a single feature map to predict the candidate regions and then does RoI pooling on different feature maps to extract local features for fusion.Compared with the current mainstream algorithms,we gain the best detection result AP=55.9%under the same backbone network. |