| New view synthesis is an important problem in computer vision and graphics,aiming at the process of extracting useful information from a single or multiple views and transforming it into a new view.One can easily imagine scenes or objects from different perspectives,however,it is still a challenging task for computer vision systems.On the one hand,computer vision systems need to have a comprehensive understanding of the image,i.e.,to understand the 3D structure and semantic information of the image.On the other hand,the mapping between views needs to rely on the 3D geometry of the scene and the poses between the views.In accomplishing these two aspects,information propagation between views causes loss of detail information,and the problem of voids caused by restricted field of view or occlusion needs to be addressed urgently.Therefore,how to improve the performance of the new view synthesis model is the main research goal of this thesis.(1)To address the problem of information loss in the traditional image coding and decoding process,which makes the new view synthesis process have blurred edges,this thesis designs a monocular image new view synthesis network model(AE-NVS)based on selfencoder.Firstly,the model adopts a feature extraction module combining dilated convolution and attention mechanism to improve the image encoder,and the joint use of multiple feature extraction blocks performs feature enhancement in a hierarchical progressive manner,and captures global contextual information while increasing the field of perception for the input image,where the reference of attention mechanism can filter and optimize the intermediate features extracted by the image encoder to obtain more valuable feature information,which helps the neural network capture precise and detailed structural information,thus effectively improving the performance of the image encoder.Secondly,the model introduces the spatial attention upsampling module(SAU)into the decoder,which improves the pixel-to-pixel connection by focusing on the important information in the feature maps passed from the image encoder,thus better extracting the image contents at a distance and effectively handling the changes in pixel positions between the source and target views.Finally,the model is trained and tested on the Shape Net and Kitti datasets and compared with the baseline methods in the field.The comparison experiments verify that the model has better performance and the ablation experiments validate the effectiveness of the feature extraction module.(2)To address the problem that the effective feature information in the source view cannot be fully utilized when the image itself has self-obscuring phenomenon or the depth information is mapped incorrectly,resulting in the missing content of the new view,this thesis designs a two-stage new view synthesis optimization network model(TS-NVS)based on contextual information,in which the first stage coarse new view synthesis network first makes a preliminary prediction of the image to stabilize the training process,and at the same time provides a priori information for the The second stage refined new view synthesis network provides a priori information about the image structure.The second stage of the refined new view synthesis network is a new view optimization network based on UNet.First,based on the semantic connection between different parts inside the image,a semantic information extraction module is introduced to calculate the similarity between neighboring feature blocks and the importance between different semantic features to achieve the extraction of semantic information such as local structure information,texture information,and color information in the feature map.Secondly,based on the relationship and connection between different parts in the image,the contextual information fusion module is introduced to extract contextual information using the idea of residual connection and dilation convolution to realize the learning of the image as a whole.Then,the performance of the network is further optimized using the dual loss of multiscale reconstruction loss and semantic constraint loss.Finally,the model is trained and tested on Shape Net and Kitti datasets,and the performance of the model is verified by comparing this domain baseline approach.(3)For the new view synthesized by AE-NVS and TS-NVS models designed in this thesis,which is blurred due to low light,a low-light image enhancement algorithm is proposed to expose the RGB grayscale image and fuse the processed exposure values with the source image as weights to obtain an image with higher quality and clarity,which can enhance the details and textures of the image and make it more suitable for human eye observation.(4)Based on the model and algorithm in this thesis,a new view synthesis visualization system is designed to verify the effectiveness of the new view synthesis model designed in this thesis.The system is designed with B/S architecture to realize the demonstration application of new view synthesis.First,this system synthesizes new views using the new view synthesis model proposed in this thesis and visualizes the process of new view synthesis.Second,the view is enhanced using the low-light image enhancement algorithm proposed in this thesis,and the process of new view synthesis is visualized. |