Font Size: a A A

Research On 3D Object Detection Technology Based On Monocular Vision

Posted on:2024-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:H N HuFull Text:PDF
GTID:1528307088463724Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
3D object detection technology is a key issue in the field of digital image processing and has broad application prospects in military,industrial and other fields.Deep learning is a complex machine learning algorithm that learns the intrinsic laws of sample data,obtains the deep features of the sample,and thus better completes the interpretation of the sample.In recent years,thanks to the development of 3D detection algorithms based on point clouds and binocular vision,the accuracy of 3D object detection has been significantly improved.However,point clouds require expensive Li DAR equipment to acquire,and the implementation of binocular imaging equipment has a high standard,and the cost is high during industrial applications.Therefore,it is of great significance to improve the existing 3D object detection algorithm based on monocular vision by using the advantages of image information mining,so that its performance can be comparable to that of Li DAR-based or binocular vision-based 3D object detection algorithms.It also has academic significance and practical value to carry out research on monocular 3D object detection.Based on the analysis of the key technologies of monocular 3D object detection,this paper studies the key components such as monocular depth estimation algorithm,target-aware monocular depth optimization algorithm,3D object detection algorithm with pseudo-Li DAR point cloud,and stereo-image based monocular 3D object detection algorithm.This paper mainly completes the following four aspects of work:1.In-depth research on monocular depth estimation methods,aiming at the problem that convolutional neural networks and visual Transformers as the backbone network lack the representation of global information and local texture information in pixel-level depth estimation tasks,respectively.Thus,a monocular depth estimation method combining the advantages of convolutional neural networks and visual Transformers is proposed.By reconstructing the composition of the convolutional layer and adding a Transformer module after the convolutional module to construct the encoder,it holds the ability to extract multi-scale local and global features.A multi-scale convolutional neural network is used as a decoder to further perform dense pixel-level depth regression based on the fused features.This method uses visual Transformers to model the global correlation of multi-scale convolutional features,improving the accuracy of depth estimation prediction.2.In-depth research on the monocular 3D object detection method,aiming at the problem that the limitation of current monocular 3D object detection methods mainly come from the inaccuracy of the foreground object position generated by monocular depth estimation,a target-aware monocular depth optimization method combining instance segmentation and geometric constraints is proposed.Instance segmentation module based on targets’ center point regression is redesigned to obtain the 3D height distribution of the target.Using the depth distribution calculation based on the camera imaging principle,the target depths at different distances are optimized in conjunction with the depth estimation method,and the depth of the target to be detected is optimized by using the uncertainty learning strategy,which improves the accuracy of the depth estimation method for the target depth estimation at different distances.3.In-depth research of the impact on monocular 3D object detection between Li DAR point clouds and pseudo-Li DAR point clouds generated by monocular images.Aiming at the long-tail problem of pseudo-Li DAR point cloud,a monocular 3D object detection method based on pseudo-Li DAR point cloud distribution optimization is proposed.First,the encoder integrates the Set Abstraction module of Point Net++ and the Transformer module to further enhance the global consistency of the features.Then,a decoder with multi-scale feature-level supervision is used to re-distribute the pseudo-Li DAR point cloud.This method further uses a Li DAR point cloud-based 3D object detection method to perform 3D detection on pseudo-Li DAR point cloud data.Extensive experiments show that this method improves the accuracy of monocular three-dimensional object detection.4.In-depth research on the monocular 3D object detection algorithm based on stereo image.Aiming at the problem of inaccurate generation of BEV features from nonoverlapping 2D images and the lack of temporal correlation,a stereo-image based monocular 3D object detection method based on the long-short-term temporal fusion and motion feature distillation is proposed.Different feature resolutions are used to extract long and short-term temporal features,and the encoder based on the Transformer cross-correlation module is used to jointly embed motion features and depth information,and further integrate them into the BEV features fused with long-short-term temporal features.Subsequently,a decoder with motion feature distillation technology is proposed to complete spatial 3D positioning.This method integrates the stereo-based feature expression of different time steps,supplemented by motion features and depth information,further improves the accuracy of stereoimage based monocular 3D object detection.Experiments show that this method can improve the stereo-based 3D detection accuracy.This paper aims to address the shortcomings in the current research on key technologies for 3D object detection based on monocular vision,and proposes several algorithms to improve the performance of 3D object detection tasks based on monocular vision,including improving the accuracy of monocular depth estimation,optimizing the depth of foreground objects through instance segmentation and geometric constraints,optimizing the distribution of pseudo Li DAR point clouds,and introducing long-short-term temporal fusion and motion features distillation based on non-overlapping stereo-image monocular vision.Moreover,extensive experiments and ablation studies were carried out to verify the performance of each proposed algorithm.This paper has reference value for improving the accuracy and robustness of monocular 3D object detection and expanding the application field of certain research area.
Keywords/Search Tags:Monocular vision, Depth estimation, 3D object detection
PDF Full Text Request
Related items