| High precision environment perception is an important guarantee for driving safety of autonomous vehicle.As a key task in environment perception,3D object detection has received widespread concern in recent years,it perceives the location,size and movement direction of surrounding objects from the environmental information obtained by various sensors on autonomous vehicle,thus providing important decision-making basis for driving.In recent years,3D sensing technology has been developed continuously,the main media of environment perception has gradually expanded from 2D image data to 3D data represented by point clouds,which provide data basis for 3D object detection research.With the vigorously development of deep learning technology,deep neural network based on convolution and Transformer has achieved extensive success in academia and industry,providing new ideas and methods for 3D object detection research.From this,this dissertation focuses on the research of 3D object detection method in the autonomous driving scene with Li DAR point cloud,image and deep neural networks.The main research contents and contributions are as follows.Firstly,point cloud 3D object detection based on convolutional neural network is studied.Aiming at the problem of keypoint information loss and raw point cloud information waste,Local Feature Enhancement module is designed to enrich the local geometric information of keypoint neighborhood and enhance the local characterization ability of key points.Keypoint Weight Enhancement module is designed to introduce rich raw point cloud information for enhancing the learning process of keypoint weight.On this basis,Local Feature Enhancement and Keypoint Weight Enhancement Point Voxel Region Convolutional Neural Network is constructed.Experiments on KITTI dataset show that the proposed network improves the detection precision of point cloud 3D object detection network PV-RCNN.Secondly,structured image pixel data is introduced to carry out research on multimodal 3D object detection based on convolutional neural network.Aiming at the problems of slow speed and difficulty in accurately aligning multimodal features,a sparse processing mode of multimodal data is designed to improve running speed and reduce mapping alignment difficulty.Furthermore,aiming at the problems of incomplete use of multimodal information and rough feature fusion,a multi-scale and multimodal feature extraction and fusion network is proposed to comprehensively extract multimodal features and perform fine feature fusion.A learning task is also designed to obtain effective classification information of objects.Based on this,Sparsely Represented Inputs Multi-scale and Multimodal Fusion Region Convolutional Neural Network is constructed.Experiments on KITTI dataset show that the proposed network balances the precision and speed of deep fusion multimodal 3D object detection network over the same period.Thirdly,unstructured image pseudo point cloud data is introduced to continue the research on multimodal 3D object detection based on convolutional neural network.In view of the problem of rough feature extraction from pseudo-point cloud and poor feature characterization ability of regions of interest,Fine-grained Attention Convolution is proposed to carry out fine feature extraction from pseudo point cloud.Self-adaptive Group Sparse Convolution is proposed to divide features of regions of interest,and differentiated learning is carried out to obtain multi-scale information and enhance the characterization ability of features of regions of interest.On this basis,Self-adaptive Region Convolutional Neural Network is constructed.Experiments on KITTI dataset show that the network has effectively improved the defects of similar networks and achieved advanced detection level among 3D object detection network over the same period.Finally,3D object detection based on Transformer neural network is studied.Aiming at the problem that the receptive field is limited when Transformer is applied to point cloud data,a hybrid sampling strategy is designed to obtain crossed voxel sets adapted to the point cloud distribution;Taking crossed voxel set as the basic unit of self-attention calculation,Scattered Local Attention computation is proposed to increase the receptive field of point cloud voxels.On this basis,Crossed Voxel Set Voxel Transformer is constructed.Experiments on the KITTI dataset show that the precision of existing 3D detection networks will be improved after integrated with the proposed structure. |