| Object detection is one of the most important problems in computer vision.Its purpose is to categorize and locate all instances of objects in a specific category in an image.The rise of deep convolution network stems from object detection,which has successfully pushed object detection to a new stage.In order to improve the accuracy and speed of object detection tasks,many detectors are proposed based on the architecture of convolutional neural network,which makes the object detection make great progress.However,due to the resolution and information limitations of the small object itself,the usual detector cannot effectively detect the small object.Especially in the single-stage detection algorithm,there are significant differences between the detection performance of small objects and large objects,which makes small object detection much more difficult than ordinary object detection.However,small objects are often common in daily life,such as underwater object detection,traffic sign detection,vehicle detection in surveillance video,so small object detection has become an indispensable and challenging problem in computer vision.This paper focuses on the multi-scale hierarchical structure of single-stage object detector in deep learning framework,aiming to improve the existing multi-scale architecture to enhance the detection ability of single-stage network for small objects.By developing network variants to make full use of multi-scale representation performance,the network obtains multi-scale information to adapt to the changes of various scale,thus improving the detection ability of the network itself to small objects.The main findings include the following:(1)To study the effect of multi-scale fusion strategy and fusion direction on small object detection performance.The existing feature pyramid-based method tend to keep the number of channels consistent and fuse different scales by adding corresponding elements or channel concatenation,which is prone to lose low-level detailed feature information in feature fusion process.In order to solve this problem,the feature pyramid of bi-directional stepped concatenation is proposed,so as to effectively utilize the detailed information and semantic information to improve the ability of small object detection.The method,with VGG16 and Resnet50 as the backbone networks,achieves 80.3% and 82.4% m AP on PASCAL VOC2007,respectively.And also reaches 90.4% m AP on UCAS-AOD,proving that the network has good generalization for small objects.(2)To study the effect of shallow information enhancement on small object detection.In order to solve the information loss caused by the sampling process caused by the limitation of the shallow information of the network itself,the interactive multi-scale feature representation enhancement network is proposed.The ability to detect small objects is enhanced by interactive multi-scale input and interactive adjacent layer aggregation structures from the outside.A significant improvement in accuracy over baseline is achieved,resulting in 81.9%m AP and 41.1% AP on PASCAL VOC and MS COCO,respectively.(3)To study the effect of multi-scale resampling of shallow regression information on small object detection.In order to solve the contradiction that the detailed information and semantic information of the same feature,the dynamic refinement object detection network based on shallow localization is proposed,which abandons the shallow classification branch and uses the pyramid network with more robust semantic information to classify and fine-tune the shallow positioning results.Meanwhile,dynamic factors and deformable operations are used to improve the network’s adaptability to multiple scales,thus improving the detection effect of the network on small objects.The network structure achieves excellent detection accuracy,with 82.7% m AP on PASCAL VOC and 37.8% AP on MS COCO. |