| Light field images can capture not only the position but also the direction of incident light rays.However,real-world light field cameras typically cannot capture depth truth values using equipment such as LIDAR,unlike single images.Therefore,reliable depth estimation algorithms are particularly important as a prerequisite for many visual tasks in the light field image field.The low spatial resolution of a single sub-view of the light field often limits its performance in tasks that require high spatial resolution,such as new view synthesis.To address these issues,this paper proposes a light field image depth estimation algorithm and a new perspective view synthesis algorithm.The main contents include:1)A transformer-based dual-branch deep learning network is proposed to learn the longrange dependency of the light field image using the self-attention mechanism of transformer and learn multiple different features using the multi-head attention mechanism to process complex and repetitive texture areas of the light field image.Then,the fused transformer is used to fuse the feature maps of the two branches.Three modules are mainly proposed: the global feature processing module,the complex texture processing module,and the fusion-fine-tuning module.The two texture processing modules reasonably utilize the characteristics of the light field’s sub-aperture image(SAI)form and the macro-pixel image(MPI)form to extract target features,and then use the fusion-fine-tuning module to fuse the attention feature maps of the two textures and perform depth regression and fine-tuning work.The method also uses the HCI and Inria public light field synthetic datasets for training and applies various data augmentation methods to improve the network’s generalization ability.Sufficient experiments demonstrate that the proposed method can accurately regress the target depth map.2)A light field image perspective view synthesis algorithm based on conditional generative adversarial networks is proposed to use camera pose information as a condition to guide the conditional generative adversarial network to learn the content of new views.Multiple modules are proposed,fully utilizing the camera pose information and the spatial,angle,and depth information recorded by the light field’s macro-pixel image to generate predicted perspective views.Additionally,a light field image dataset based on a real-world scene is proposed.The ablation experiment section evaluates the proposed modules and verifies the effectiveness of the innovation.Sufficient experimental results demonstrate that the proposed algorithm can perform the task of generating the target perspective view well. |