| Autonomous vehicles have significant research value and application on intelligent urban transportation benefiting from its travel process without human intervention.Monocular visual perception is not only one of the most important technologies for autonomous vehicles’ navigation system,but also is research hotspot and difficulty on computer vision area.With the development of deep learning and computer hardware,CNNs(Convolutional Neural Networks)has achieved a great success in computer vision tasks.The monocular visual perception algorithm based on deep learning has become one of the most potential technology on computer vision.Aiming at the problem of being difficult to obtain labeled samples on monocular visual perception,this dissertation mainly studied unsupervised depth estimation,unsupervised visual odometry and unsupervised optical flow prediction.The specific contents and innovations are summarized as follows:1.Since the traditional monocular depth estimation method cannot estimate the scene depth without heuristic knowledge,this dissertation proposes a deep learning based unsupervised monocular depth estimation model.Based on ResNet,this dissertation first designs a new "encoder-decoder" network,DepthNet,to estimate the scene depth.Then,base on the fact that the pixel translations between their corresponding points should be equal,the disparity consistency loss function is constructed to improve the accuracy of DepthNet.Extensive ablation and contrast experiments on the KITTI dataset demonstrate that the proposed DepthNet is valid,and Experiments on the KITTI and Cityscapes datasets demonstrate that the proposed method also achieves higher performance on depth estimation than popular unsupervised depth estimation models and several supervised depth estimation models.2.The traditional feature-based visual odometry processes(feature extraction,feature description,feature matching,bundle adjustment)in a multi-stage manner and it can be only used in the region with abundant features.To address this problem,this disseration proposes a deep learning based unsupervised monocular visual odometry.The proposed model builds a mapping between a monocular image sequence and its corresponding camera poses,without any requirement for application scenarios.During training,this dissertation construct the coupling relatationship between DepthNet and the camera pose prediction network(MotionNet).During testing,the camera pose can be directly obtained by the MotionNet with the input of monocular image sequences,and then MotionNet constructs an encoder.The contrast experiments of short time image sequence,camera global trajectories,and the calculation time on the KITTI odometry dataset,this disseration analyzes the difference and the relationship between the proposed visual odometry and the feature-based visual odometry.Furthermore,we also give the future work for these deep learning based models.3.It is difficult to satisfy with the conditions in real scene on brightness invariance and optical flow smoothness.To address this problem,this disseration proposes a deep learning based unsupervised monocular optical flow prediction model.The model first employs the optical flow prediction network(FlowNet),which is an "encoder-decoder"structure and similar to the DepthNet.Base on the fact that the forward-backward and backward-forward optical flow maps of an image sequence are same,an image reconstruction consistency loss function is constructed to improve the accuracy this model.Finally,the motion information of the static scene can be calculated by the depth map and camera pose,and the dynamic objects can be obtained by FlowNet.Extensive ablation and contrast experiments on the KITTI Flow dataset show that the image reconstruction consistency loss function and the static scene’s optical flow can improve the accuracy of the proposed model.4.Most existing models do not distinguish a pixel whether it belongs to the overlapping region or non-overlapping region of an image during the image reconstruction.To address this problem,this paper proposes a deep learning based unsupervised static scene adaptive motion estimation model.This model first employs the difference between global brightness and local brightness of images to construct an adaptive function,which can judge a pixel whether it belongs to the overlapping region of images.During the objective function construction,the adaptive function can be used as a weighting factor to revise the image reconstruction.Experiments show that the adaptive function effectively decreases the detrimental effect of the image’s nonoverlapping region. |