| Depth information plays an important role in the process of machines perceiving and understanding the three-dimensional world,it is fundamental for many computer vision applications,such as unmanned driving,robot navigation,and virtual reality.As a manner to obtain depth,stereo matching possesses the advantages of low cost and high accuracy.In the calibrated and rectified binocular camera system,for each pixel in a view of image,the purpose of stereo matching is to find the corresponding pixel which represents the same object in the other view of image.The pixel coordinate difference in the horizontal direction between the matching point pair is called disparity,which is inversely proportional to the depth of the object.Stereo matching have been studied for decades.Recently,with the development of the deep learning related algorithms,frameworks,and hardware technologies,deep neural network based stereo matching models have made great achievements and have surpassed traditional algorithms gradually.However,in real-life applications,current deep stereo matching algorithms are still facing some problems.Specifically,under the supervised paradigm,the model accuracy is not high enough and the generalization ability is not strong enough.Under the self-supervised paradigm,the training quality is poor.To tackle these problems,this thesis proposes innovative solutions,realizing the improvement of the model performances.Concretely,the main content and contributions of this thesis are summarized as follows:(1)A local similarity pattern(LSP)and cost self-reassembling(CSR)based deep stereo matching network is proposed,which effectively alleviates the problems that in current supervised stereo matching networks,the convolutional features are not discriminative enough and the stacked convolutional layers based disparity refinement modules tend to produce oversmoothing disparity results.Firstly,this network designs a deep feature LSP,which reveals the object structural information by explicitly calculating the neighbor pixel-pair relationships.It is a beneficial complement to the appearance information expressed by the convolutional feature,resulting a stronger feature representation.Secondly,this network proposes a disparity refinement method CSR,which dynamically searches for reliable and close neighbors and leverage their cost distributions to update.CSR could efficiently repair the initial disparity result.Extensive experimental results show that LSP and CSR can significantly improve the accuracy of the basic stereo matching network.(2)A domain generalized stereo matching method implemented with a broad-spectrum and task-oriented feature is proposed,which effectively alleviates the problem that deep stereo matching networks trained on the source domain perform poor on the target domain with different image styles.First of all,this method utilizes the feature of a model trained on large-scale datasets to obtain a robust representation,since this broad-spectrum feature has experienced several styles of images during its training phase.After that,this method builds a feature adapter to further learn more information related with stereo matching from the broad-spectrum feature.In addition,cosine similarity based cost volume is constructed to decouple the feature extraction module and the cost aggregation module in the stereo matching network.Extensive experimental results show that this method can significantly improve the generalization ability of current deep stereo matching networks.(3)A self-supervised stereo matching method through learning disparity refinement with stronger-weaker models,which effectively alleviates the problem that existing image reconstruction loss based self-supervised stereo matching networks perform poor in the regions with matching ambiguities such as occluded and textureless areas.First of all,this method considers leveraging a single-view semantical information based disparity refinement module to improve the disparity results of the self-supervised stereo matching networks.To train the disparity refinement module in an unsupervised manner,this method constructs two models with different capabilities,i.e.a stronger model and a weaker model,to respectively provide supervision and input for it.After training,the refinement module will be applied to the stronger model and generate pseudo labels,which can be used to train a more powerful model.Extensive experimental results show that this method can significantly improve the accuracy of current self-supervised stereo matching networks. |