Font Size: a A A

3D Object Detection Based On Multimodal And Multi-model Fusion

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WangFull Text:PDF
GTID:2492306776992889Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the development of autonomous driving technology,the need for reliability of the sensing module has increased.To secure the decision-making and control process,autonomous vehicles estimate the traces of objects around the road and obtain information such as position,speed,type,and size.Among them,for objects that are prone to false and missed detection such as pedestrians and cyclists,the existing methods paired with single sensors have shortcomings such as detection uncertainty and limited sensing range,which are more difficult to be applied to urban complex road scenarios.Therefore,this paper takes multimodal and multi-model fusion as the starting point to complement the advantages of precise spatial information of lidar point cloud and dense pixel information of 2D images to obtain more comprehensive and robust driving environment information,so as to improve the performance of 3D object detection and meet the demand of autonomous driving system for reliability and accuracy.In order to achieve higher accuracy perception,the idea of integrated learning is used to fuse the 3D bbox generated by each detection model according to their weights to further improve the accuracy of 3D object detection and meet the requirements of adapting to various driving environments.The main contents of this paper are as follows:1.A new paradigm of point cloud image fusion is proposed,and a deep convolution neural network framework for multi-task joint perception is designed.The main innovations are as follows:(1)The multi-label object recognition network is used as an auxiliary task for 3D object detection,and the predicted object category information is used as the consistency regularization constraints of the 3D object detection task,which effectively solves the phenomenon of false detection and missing detection,and improves the 3D object detection accuracy of multiple categories in the automatic driving scene.(2)In the feature extraction stage,a module based on the multi-layer gated attention mechanism is proposed to adaptively fuse spatial and semantic information of different scales from high dimension to low dimension.(3)For the two-stage detector,the ROI pooling region features extracted by the RCNN module are fused with the global depth semantic information of the image to integrate the deep semantic information to achieve more accurate regression prediction.In order to verify the effectiveness of the method,sufficient experimental comparison and ablation are carried out in this paper.Without relying on additional visual image labels,multiple public data sets are used for verification,which proves the effectiveness of the fusion strategy in this paper.2.The improved Center Point algorithm and Point Pillar algorithm are integrated to solve the problem that the perception algorithm cannot adapt to the complex road environment in the presence of many traffic participants.The main innovation points are as follows:(1)The Center Point model introduces voxel grids of different scales for sampling,while adding submanifold convolution to reduce the model complexity in the point cloud feature extraction stage,followed by adding an attention module after the RPN layer of the detection network to achieve the capture of key point cloud features.(2)For the Point Pillar model,Reg Net is first used as the backbone network to extract pseudo point cloud image features,and then the Free Anchor detection head is used to optimize the matching of objects and anchor boxes to improve the accuracy of multi-category 3D object detection in complex scenes.Finally,the improved Center Point algorithm and Point Pillar algorithm are fused by the WBF method,and several evaluation metrics are ranked top on the nu Scenes datasets and ranked first in the Neur IPS 2020 Autopilot Challenge.
Keywords/Search Tags:3D object detection, Gated fusion attention, Multi-modal fusion, Consistency regularization, Multi-model fusion
PDF Full Text Request
Related items