| Uncooled long-wave infrared thermal imager with the characteristics of independent of light,can penetrate smoke and fog,can be in a variety of weather and light conditions imaging,has been widely used in night driving,security monitoring and field rescue and many other fields.True colorization of infrared IR images helps to enhance visual perception and scene understanding of infrared IR images.It can also provide strong support for subsequent infrared IR image processing tasks.However,there is a lack of large-scale datasets suitable for the task of true colorization of infrared images,and the existing methods for true colorization of infrared images cannot really bridge the huge domain gap between infrared and visible images in a meaningful way,making it difficult to achieve high-quality true color conversion of infrared images.To address the above problems,this thesis constructs a large-scale unpaired infraredvisible multi-scene dataset I2 VM and proposes an infrared image true colorization algorithm MDCS-GAN based on multi-scale discriminator and cross-attention correlation loss.Experimental verification shows that this algorithm can significantly improve the quality of generated images in infrared image color rendering tasks.The main research of this thesis is in the following three areas:(1)A multi-scale discriminator AMSD-Net based on pre-trained VIT is proposed.Multiscale preprocessing operations on the generated and source images can enhance the discriminator’s ability to perceive local information,help the discriminator identify and distinguish visual differences between different regions,and further improve the visual quality of the generated images.The feature encoding stage uses a VIT-based encoder network that relies on a self-attentive mechanism to capture the interrelationships between different locations in the image,effectively improving the discriminator’s characterization capability.The results of comparison experiments and ablation experiments show that the discriminator can effectively improve the image detail performance and color reproduction ability,enhance the style performance of specific areas of the image,and bring significant benefits to the improvement of image raw quality.(2)A cross-attention correlation loss based on random Mask operation is proposed to enhance the structural correspondence between inputs and outputs by making full use of the structural knowledge of nighttime infrared data.The pre-processing operation of randomly masking input image blocks can force the model to focus on specific regions and eliminate certain regions’ impact on the entire image’s global information.A VIT-based encoder network is used to capture the global relationships between different regions of the source and generated images to help the network learn a deeper representation of content relevance.The structural consistency between the generated and source images is further constrained by minimizing the cross-attention correlation loss between the two and eliminating the negative impact of imagespecific domain styles.The results of the comparison and ablation experiments show that the introduction of this loss ensures the consistency of the spatial structure of the source and generated images and effectively improves the quality of the generated images.(3)Construct a large-scale infrared visible multi-scene dataset I2 VM for studying the conversion task from infrared images to daytime visible images.The experimental environment is built on a GPU platform,and the algorithm network construction,loss function design,and the whole network training and testing process are implemented based on the Pytorch framework coding.Relevant comparison experiments are carried out based on the I2 VM dataset.A combination of subjective evaluation methods and objective evaluation metrics is used to evaluate the generation results of this algorithm and the other four algorithms to verify the superiority of this algorithm.Meanwhile,two key designs in the network are analyzed in ablation experiments to verify their effectiveness. |