Font Size: a A A

Research On Monocular 3D Object Detection And Object VSLAM Scale Restoration

Posted on:2023-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:P L GuFull Text:PDF
GTID:2568306617462104Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In the development of cloud service robot environment perception algorithm,to ensure that the cloudification algorithm can be more widely generalized to more robots and personal devices,it needs to have high universality and communication efficiency.Among them,monocular cameras are widely used as environment perception sensors for cloud robots due to their advantages in price,size,and data volume.According to this,this paper designs a monocular object-level VSLAM algorithm based on 3D object detection for robot environment perception.The 3D object detection algorithm is used to perceive semantic objects in the environment,and the VSLAM algorithm is used for the calculation of camera pose and environment mapping.But,due to the lack of depth observation,monocular 3D object detection uses the ground truth of depth in the dataset to train the model to predict the 3D Bounding Box of corresponding object through the 2D features.This method leads to overfitting of the model,which is difficult to be widely used on robots in real scenes;the monocular VSLAM lacks the observation of the real scale,and there will be a scale drift,and it is impossible to build an environment map with a real scale.However,the two algorithms have their own characteristics.Monocular 3D object detection can provide prediction of the size of objects,and VSLAM can calculate continuous camera pose.The advantages of the two are complementary.Based on this,this paper conducts research on the deep integration of the two algorithms.Aiming at the problem of poor generalization of monocular 3D target detection,based on the projection model of quadratic ellipsoid surface,this paper establishes geometric constraints between 3D objects,ground plane and 2D detection,and proposes a post-processing method for monocular 3D target detection.The algorithm first calculates the minimum bounding box projected on the image by the quadratic ellipsoid surface corresponding and establishes a residual with the observation.Through minimum the difference between the projection and the 2D detection,the algorithm can be effectively improved.Under the premise of 3D target size prior,this algorithm can even be combined with any 2D target detection algorithm.Furthermore,this paper proposes a ground plane estimation strategy based on the VSLAM method.Under the assumption that objects are placed parallel to the ground plane,a constraint that the target is tangent to the ground plane is constructed,which greatly improves the detection accuracy of distant targets.The test results on public datasets show that the proposed algorithm can effectively improve the generalization performance of monocular 3D object detection across datasets and has extremely high real-time performance.Aiming at the problem that the drift and missing of scale of monocular VSLAM,this paper proposes a cross-scale multi-frame joint optimization algorithm based on 3D object detection.At the same time,it can effectively eliminate the incorrect position observations under the VSLAM scale.Combined with the continuous observation of the true scale prediction,this paper can effectively suppress the scale drift of the monocular visual odometry and restore the scale of the monocular VSLAM by initializing the VSLAM system and joint BA.Furthermore,the multi-frame joint optimization based on the real scale further improves the accuracy of monocular 3D target detection.Comparative experiments on public datasets show that the proposed algorithm has higher trajectory accuracy and detection accuracy and can run in real time.The real-world scenario test verifies the good generalization of the algorithm in this paper.Through deep fusion of monocular 3D object detection and monocular VSLAM,the method proposed in this paper can output a sparse semantic map with real scale.It can provide high-precision and real-time environment perception for the interaction and navigation of service robots equipped with monocular cameras in complex scenes.
Keywords/Search Tags:Monocular VSLAM, Sematic Mapping, Monocular VSLAM scale restoration, Monocular 3D Detection
PDF Full Text Request
Related items