In recent years,as Internet companies,new car manufacturers,and traditional car companies have invested in the autonomous driving market,the field of autonomous driving has shown a hot trend.As the most important part of the autonomous vehicle perception system,3D object detection directly determines whether the autonomous vehicle can operate safely as expected.This makes 3D object detection one of the most active research areas in computer vision.Nowadays,smaller and more powerful computing devices can be deployed on vehicles,and various modal sensors have enhanced the vehicle’s ability to model the environment.The development of 3D object detection has ushered in many opportunities;but at the same time,there are still various difficulties and challenges in the field of 3D object detection for autonomous driving,except for sparse point clouds,small-sized targets,noisy backgrounds,and changing lighting,the open road environment where the vehicle is located also puts forward high requirements for its reliability and real-time performance.To this end,in response to the above-mentioned difficulties and challenges,the main contributions of this article are as follows:(1)Aiming at the problem of the sparse number of small target point clouds in road traffic scenes,this paper analyzes F-Point Net,introduces target’s RGB depth features and proposed a 3D object detection method based on multimodal fusion.This method includes a point cloud feature fusion network based on RGB depth features,which uses the depth feature of the target image as the global feature of the point cloud,and provides semantic information other than the point cloud feature for point cloud classification and regression,and improves Detection accuracy when the number of point clouds is sparse.The experimental results on the KITTI data set show that the method proposed in this paper has improved the average accuracy of multiple categories compared with F-Point Net,especially for the smaller Pedestrian target.(2)Aiming at the fixed feature coding and the apparent hollowing out of the Cyclist target in the above-mentioned work feature fusion process,this paper further proposed a 3D target detection method based on the attention mechanism.The method includes a multimodal feature weighted fusion network based on channel attention,and the weighted fusion of multi-modal features is achieved by modeling the correlation between each channel of RGB features.The experimental results on the KITTI data set show that this method has a further improvement compared with the former.The average accuracy of the Pedestrian target has increased by more than 3%,and all indicators have achieved the top 2 in comparison with other methods.The method in this paper as an algorithm for fusion of multiple sensor modalities for3 D target detection in road scenes.The introduction of target RGB features improves the detection robustness of small-scale targets;the attention mechanism also enables it to focus on key information,thereby further improving the accuracy of detection.At the same time,the methods in this paper can run at a frame rate of more than 38 frames per second,which meets the real-time requirements of the scene.These advantages are of great value for road traffic scenarios where safety is first. |