| Three-dimensional reconstruction,as one of the hot research directions in computer vision,plays an important role in many applications such as robotics,autonomous driving,SLAM,virtual reality,and artificial intelligence.Three-dimensional reconstruction can be divided into voxel-based reconstruction,mesh-based reconstruction,and point cloud-based reconstruction.This study chooses to focus on point cloud-based reconstruction.Currently,three-dimensional point cloud reconstruction can be categorized into traditional methods based on geometric and photometric consistency,as well as deep learning-based methods.Traditional algorithms for point cloud reconstruction have matured and offer a relatively simple workflow with controllable costs.However,these methods require manual design of complex feature matrices and are mainly targeted towards ideal Lambertian surfaces.As a result,when dealing with complex real-world objects,these methods often struggle to achieve satisfactory universality and accuracy.On the other hand,deep learning-based point cloud reconstruction algorithms utilize neural networks to extract image features and calculate feature volumes relative to the world coordinate system using camera parameters and homography transformations.By constructing cost volumes based on the feature volumes of source and reference images and continuously learning and refining these cost volumes,these algorithms optimize the generated depth maps to obtain accurate point cloud models.Compared to traditional methods,deep learning-based approaches exhibit significant improvements in terms of universality,reconstruction accuracy,and handling complex scenes.However,existing learning-based algorithms still face challenges such as missing feature map information,missing cost volume information,and noise interference.To address these challenges,this study proposes a cascaded network called Att MCVA-MVSNet,which consists of three modules: feature selection and processing module,multi-cost volume aggregation module,and depth consistency regularization module.Att MCVA-MVSNet improves upon existing networks in these three modules:(1)To tackle the issue of missing information in input feature maps,the feature selection and processing module utilizes attention mechanisms to capture semantic information and contextual connections within the feature maps.This enhances the quality of the input feature maps,thereby improving the network’s feature representation capability.(2)To address the problem of information loss caused by cost volume construction based on variance,the multi-cost volume aggregation module employs a grouped vector dot product method to calculate the similarity between feature maps from different viewpoints and constructs multiple cost volumes.Neural networks are then used to learn the weight information of each cost volume,which are subsequently aggregated by weighted summation.Multiple cost volumes preserve more point cloud information,resulting in improved reconstruction accuracy.(3)In order to enhance the information exchange between different stages of the cascaded network,the previous stage’s cost volume is used as guidance information.A difference matrix is constructed between the previous stage’s cost volume and the current stage’s cost volume.Through attention mechanisms,effective information within the matrix is learned and utilized to guide the construction of the current stage’s cost volume,achieving regularization of the cost volumes.Experiments were conducted on the DTU dataset and the Tanks And Temples dataset,noise reduction optimization was performed before training the network using the DTU dataset.On the DTU dataset,Att MCVA-MVSNet achieved a precision of0.356,a completeness of 0.330,and an overall score of 0.343,outperforming other methods in evaluation.On the Tanks And Temples dataset,it ranked first among all selected methods in terms of the average F-score for each scene,demonstrating excellent generalization ability.The experimental results show that Att MCVA-MVSNet exhibits superior reconstruction performance and generalization ability compared to other methods. |