| 3D target detection and recognition is one of the key technologies of 3D scene sensing and autonomous driving.Images and point clouds are the most common data in self-driving sensing modules,and there are some limitations to performing 3D target detection tasks based on just one type of data.Image data do not provide reliable 3-D geometry and have difficulties in complex or poor lighting conditions.Conversely,point cloud data can provide high-precision 3-D geometry that does not vary with light conditions,but is limited by low resolution,low refresh rate and high cost.Multi-sensor data fusion can greatly improve the redundancy and error tolerance of the system and improve the accuracy of the 3D target detection algorithm in drone technology.In view of the existing multi-sensor fusion target detection methods can not meet the application needs,this paper focuses on image and point cloud feature fusion and 3D target detection methods based on the characteristics that multi-sensor data fusion can complement each other.The main work of this paper is as follows:An image preprocessing algorithm based on histogram median filter is used to reduce the noise of image data and KITTI dataset is used to verify the validity of this method.A pre-processing process and method of point cloud data including point cloud cavity repair,point cloud mosaic,point cloud denoising and ground point cloud segmentation are pr esented.Ground point cloud segmentation adopts RANSAC algorithm,which is simple and efficient and has good convergence compared with other segmentation algorithms.The validity of this algorithm is verified by point cloud data from KITTI dataset.High-quality data for subsequent image fusion with point cloud.A feature fusion network,VPMNet,is designed using voxel-pixel matching image and dot cloud.The network first uses downsampling thresholds and dynamic increments to bulk the input point cloud along a predetermined grid size to extract pixel-scale feature diagrams from Res Net.Multilayer sensors and KPConv were then used to extract the somatic features of non-null somatoids,thus achieving a 1:1 match between the image and the image pixel extracted from the point cloud data.Finally,we use VPMNet,a feature fusion network based on void-pixel matching,to realize accurate feature diagram fusion,integrate the new features of the fusion into the subsequent PCSCNet network,continue the task of semanti c segmentation,and verify the validity of the proposed network by using semantic segmentation.Comparing the performance of PCSCNet as a backbone network,m Io U increased by 2.7% in the validation results of the Semantic KITTI dataset,showing significant performance improvements in the bicycle,motorcycle,human and truck categories,with 14.3%,8.1%,5.8%and 9.0%,respectively.A 3D target detection network based on image and point cloud fusion is designed,which is integrated by F-Point Net and PointCNN architecture.The detection targets of RGB images are first extracted from the feature pyramid network(FPN)and extended to a 3D point cloud region based on void-pixel feature fusion.The target candidate region is then obtained by generating a point cloud cone,which is passed on to PointCNN for more accurate target segmentation.Finally,the 3D target mask of the predictive candidate region is detected by using the mask.Experiments using the KITTI dataset demonstrate that the VPMNet network enhanced point cloud features and PointCNN extracted mask can more easily recognize 3D targets than the original F-Point Net,In the comparison of target detection accuracy on 3D wraparound boxes,the detection performance on Car improved over Baseline,with a minimum improvement of 0.88% and a maximum improvement of 1.41%.On Pedestrian detection task,the improvement is 1.46%~3.38% compared with Baseline.Cyclist outperforms F-Point Net with 1.60%~2.78% improvement. |