| Environment perception system is an important part of autonomous driving.It collects multimodal data from multiple sensors,analyses and understands the surrounding environment,fuses the perception results and sends them to the decision module as an important reference for guiding the vehicle.The environment perception system is the guarantee of safe driving of autonomous vehicles.As one of the most important sensors in an Environment Perception System,Li DAR provides point cloud information about the surrounding scene for vehicle driving.Li DAR-based 3D object detection and point cloud instance segmentation help vehicles to identify,locate and understand objects of interest in the environment at a more advanced level.The Li DAR perception results need to be fused with those of sensors such as cameras to enable the aggregation of information about the target.The calibration technology of the multiple sensors is an important factor in the effectiveness of the fusion.The current environmental perception system does not yet meet the requirements of fully autonomous driving in terms of perception accuracy and reliability,and the perception accuracy,fusion quality and computation speed of the environmental perception system still need to be continuously improved.In order to improve the speed and accuracy of environment perception for autonomous driving,this paper carries out theoretical analysis,algorithm design,technical implementation and real-world verification of key technologies for environment perception,namely 3D object detection,3D point cloud instance segmentation and automatic camera and Li DAR calibration algorithms,with the main research content as follows.(1)A one-stage 3D object detection algorithm based on sparse convolution is proposed.To address the imbalance between efficiency and accuracy in 3D object detection,a feature extraction network and a detection head network based on a fully sparse convolutional neural network are designed to replace the anchor based detection method with a foreground segmentation method to predict the bounding boxes in multiple directions based on each foreground point.Meanwhile,a new object bounding box coding method is proposed,which represents the object bounding box as two mutually perpendicular line crossing the foreground point,and calculates the object bounding box indirectly by predicting the offset of the endpoints of the lines relative to the foreground point.Experiments on the KITTI dataset show that the 3D target detection network based on foreground segmentation proposed in this thesis has improved in speed and precision compared to the state-of-art algorithms.(2)A two-stage 3D object detection algorithm based on sparse convolution is proposed.Aiming at the problem of weak and difficult recognition accuracy in 3D object detection,a 3D object detection method based on object centers is designed.The position of object centers is first predicted on the bird’s-eye view,after which the features on each object center are calculated,and finally the bounding box of the object is predicted on each center.In order to compute the features at the object centers efficiently,an assignable output active point sparse convolutional neural network(AOAP-SCNN)is proposed.The AOAP-SCNN is used to process three scales of feature maps to obtain multi-scale features at the center of each object.Compared to the method of 3D object detection network based on foreground segmentation,this method improves significantly on the detection metrics of moderate and hard targets on the KITTI dataset,while it enables real-time point cloud processing.(3)A front view based point cloud instance segmentation algorithm is proposed.To address the problem of poor segmentation accuracy in the fast instance segmentation algorithm,the method reverts the point cloud to a native range view(NRV)as the front view,and segments the front view to achieve segmentation of the point cloud.In the feature extraction network,a DLA network with Point Net is designed,which uses a simplified Point Net network to extract the local features of the point cloud in the front view,while the DLA network performs multi-scale feature extraction on the network,and the output features of both are fused to predict the foreground score,center offset,object size and object orientation of each pixel by a multi-task head network.Finally,the predicted centers of the object were clustered using a clustering algorithm,with each category representing one instance.In the experiments,the 3D object detection dataset from KITTI was used to generate the dataset for training point cloud instance segmentation,and the experiments showed that the algorithm in this thesis is better than the simultaneous algorithms in terms of segmentation precision and speed.(4)A Li DAR and camera calibration algorithm based on point cloud segmentation and image segmentation is proposed.To address the existing shortcomings of automatic calibration,such as the large amount of optimization calculations and long computation time,an automatic calibration system of camera and Li DAR based on the embedded platform Hisilicon Hi3559AV100 is designed.The aim of the system is to achieve online calibration in an on-board condition.The method is based on point cloud and image segmentation,generating a 3D cone frame enclosing the object point cloud and a 2D object segmentation mask external box,and generating 2D-3D corresponding points using a virtual point correspondence based method,and finally solving the external parameter matrix of the camera between Li DAR using Pn P.Moreover,the model conversion and inference method of the automatic calibration algorithm is proposed for the embedded platform Hisilicon Hi3559AV100.Simulation and outdoor experiments on the KITTI dataset show that the proposed auto-calibration algorithm is able to correct the external parameter matrices of the camera between the Li DAR,and that the auto-calibration system is valuable for use. |