| Environment perception is a key module to realize the autonomous driving function of intelligent vehicles,providing necessary detection data for path planning,driving decision,and vehicle control of intelligent vehicles.Currently,the dominant sensors in the environment sensing system are LIDAR or vision sensors.Vision sensors have the advantages of low cost and rich semantic information,but their orientation to complex traffic targets makes it difficult to recover accurate spatial geometry information due to the limitations of two-dimensional information of images.LIDAR has considerable geometric measurement accuracy,but the laser point cloud has sparse and disordered characteristics,which makes it difficult to extract key features to complete the recognition and classification tasks.In order to reduce the limitations of single sensor applications,give full play to the perceptual superiority of heterogeneous sensor fusion taking into account the characteristics of different sensing data,and aim to achieve reliable detection of small targets,obscured targets and other complex traffic targets,this paper is based on the theoretical analysis of the joint multi-sensor calibration mechanism,image detection methods,and image and point cloud fusion detection methods,resulting in the following research:(1)Joint calibration mechanism of Li DAR and vision sensors.The joint calibration mechanism of Li DAR and vision camera was analyzed to provide theoretical support for the realization of heterogeneous sensor data interaction.By designing the joint calibration scheme,building the experimental vehicle platform,customizing the calibration board,and selfcollecting the image-point cloud data set,the joint calibration method of Li DAR and vision sensors was formed,and the feasibility of image-point cloud data interaction was verified.(2)Image target detection method.A YOLOv3-based multi-target detection method was proposed for the problem of missed and false detection in the recognition of small and occluded targets in complex traffic scenes.The method extracts feature maps of different depths through Darknet53 network,and inputs spatial pyramid pooling module to enhance feature representation,designs a multi-scale feature fusion mechanism that takes into account different dimensions of target feature acquisition,and extends the detection scale based on this mechanism to refine semantic information and enrich the representation of shallow features and deep features.In addition,the matching effect between the prediction anchor frame and the target to be detected was optimized by improving the selection method of the initial clustering center in the K-means algorithm,and the flexible NMS algorithm with the introduction of Gaussian penalty function is applied to adjust the confidence score flexibly.The experimental results show that the multi-category target accuracy of this method is better than the main method YOLOv3 under the premise of satisfying the real-time performance.For the detection of small targets and obscured targets,this method also shows certain superiority.(3)Target detection method with point cloud and image fusion.In order to form an accurate3 D spatial description of complex traffic targets,a target detection method with image and point cloud fusion was proposed.The method is based on the fusion strategy of Frustum Point Net,and uses an improved image detection method to generate two-dimensional detection results,and forms better candidate point cloud regions through mapping relationships to improve the acquisition probability of point clouds of interest.To reduce the information loss of sparse point clouds,a two-dimensional attention module was embedded to extract key features in two different dimensions of space and channel to a greater extent.Focal loss function was introduced to calculate the loss value of point cloud segmentation,to adjust the balance of interest point cloud samples and noisy point cloud samples,and to improve the learning ability of the training model for complex target samples by adjusting the factors.The experimental results show that the method has higher accuracy for 3D detection of complex targets than the main method Frustum Point Net.Compared with a single image detection method,this method can describe the 3D spatial location of the target more accurately. |