| 3D object detection plays an important role in applications such as autonomous driving and robot environment perception.It mainly studies how to effectively perceive 3D environment information,and accurately classify and locate objects of interest.Compared with 2D object detection,3D detection is more challenging due to the increase in dimensions.On the one hand,various types of sensor data have their shortcomings when applied to 3D scene understanding alone.For example,the point cloud obtained by lidar is sparse and irregular;the image obtained by the camera lacks spatial depth information.On the other hand,objects are randomly distributed in space.When the objects are far away or partially occluded,it is easy to miss detection.In view of the above problems,this paper focuses on how to improve the accuracy,robustness and real time of 3D object detection based on the complementary advantages of multi-sensor information fusion,and improves and optimizes the existing work from two aspects of point cloud and image feature extraction and fusion.The main work of this paper is as follows:(1)Aiming at the problem of poor forward transmission of deep semantic information in traditional feature pyramid structure,a full resolution feature extractor based on skipping feature pyramid is designed.This feature extractor uses VGG16 as the backbone network to construct the feature pyramid,and fuses the semantic information of multi-layer semantically stronger feature maps and the detail information of low-level feature maps through skipping connection,which provides a more effective full resolution feature map for subsequent detection tasks.Experiments show that the improved feature extractor can improve the overall detection ability of the algorithm.(2)Aiming at the problems that existing fusion methods ignore the information loss caused by the quantization process of point cloud and the poor detection robustness caused by the rough fusion process,a multimodal feature fusion method combined with adaptive fusion strategy is proposed.This method first uses Point Net network to supplement local features of original point cloud,then uses adaptive fusion method to dynamically adjust the weight of bird’s eye view,image and point cloud regional features to participate in detection task,and obtains fusion feature with stronger robustness.Finally,feature concatenation method is used to integrate orientation features of candidate boxes.Experimental results show that this fusion method can significantly improve detection precision of the algorithm,and the detection precision of three difficulty sets in KITTI validation set is improved by 2.17%,2.18% and 7.56% respectively.(3)Based on the above improvement method,a two-stage detection method based on point cloud and image fusion and a fast single-stage 3D object detection method are implemented.The two-stage fusion detection method mainly includes two parts: 3D region proposal network based on skipping feature pyramid and 3D object detection based on multimodal feature fusion.The fast single-stage detection method transforms the object detection into a regression problem,and uses focal loss to solve the class imbalance problem caused by dense anchor boxes,which effectively reduces the model parameters and improves the detection speed by about 30% while slightly sacrificing the detection precision.Multiple experiments are carried out on KITTI and nu Scenes datasets.Experimental results show that the improved method proposed in this paper can effectively improve the accuracy and robustness of 3D object detection,and the detection method in this paper not only has certain advantages for the detection of long-distance objects and partially occluded targets,but also has certain adaptability for the changes of lighting conditions in different scenes. |