The main task of binocular stereo matching is to match two or more pixels in an image,calculate the cost volume,and then obtain the disparity value.Finally,depth information is obtained through the disparity value calculation.Binocular stereo matching,as one of the key technologies in stereo vision,is widely used in fields such as autonomous driving,unmanned aerial vehicles,and augmented reality.Currently,most binocular stereo matching algorithms based on deep learning use single scale methods,which process the cost volume of a single resolution.The single scale method processes a single amount of information and needs to be combined with 3D convolution to improve accuracy.However,the high computational complexity of 3D convolution will slow down the speed of the model.Multi scale methods can balance multiple scale information,which is beneficial for improving the robustness of the model in weak textured and textureless regions.However,achieving a good balance between accuracy and speed in multi-scale algorithms and improving the robustness of special areas such as edges and illumination is also one of the current research directions in multi-scale methods.The research content of this article is as follows:(1)In a deep learning binocular stereo matching network,the multi scale 2D convolution binocular stereo matching method has the problems of low robustness to disparity prediction at the edges,and the performance of feature extraction needs to be improved.This paper proposes a stereo matching network structure that combines deformable convolution with bilateral meshes.Firstly,an improved feature pyramid is used to extract image features and reduce semantic information loss.Then,attention mechanism and Meat-ACON activation function are added to the 2D deformable convolutional cost aggregation structure to improve the aggregation efficiency.Finally,we use bilateral grid upsampling instead of traditional interpolation upsampling to improve the robustness of the predicted disparity at the edges.(2)Compared with the initial cost volume generated by the cost volume construction section in the 3D convolution method,the initial cost volume generated by the cost volume construction section in the 2D convolution method lacks rich information,resulting in a lower robustness of special areas in the disparity map affected by illumination.Therefore,this paper proposes a multi scale cost attention and adaptive fusion stereo matching network.Firstly,a multi-scale adaptive cost attention module is designed to generate cost attention weights,which are combined with the initial cost volume to suppress irrelevant information and enrich the cost volume.Then,the cost volume is inputted into the multi-scale aggregation section.During cross scale aggregation,a multi-scale adaptive fusion module is designed to improve the fusion efficiency of cross scale aggregation.Thereby improving the disparity robustness of areas affected by illumination in the disparity map.(3)Compared with AANet in the Scene Flow dataset,the stereo matching network combining deformable convolution and bilateral meshes has improved robustness to edge prediction in disparity maps.In the real scene KITTI 2012 dataset,the edges of slender objects such as electric poles are well preserved.In order to retain multi scale cost attention as much as possible,a stereo matching network based on multi scale cost attention and adaptive fusion is trained in three steps in the Scene Flow dataset.Experimental results show that the predicted disparity map of the proposed network is closer to the disparity truth map.The model generalization performance test was conducted in the Middlebury dataset,and the model generalization ability was improved compared to the baseline network.In the real scene KITTI2012 and KITTI2015 datasets,the problem of error matching in areas affected by lighting is alleviated. |