| The fusion of infrared and visible images is an important topic in the field of multi-modal image fusion,which aims to fully integrate the most meaningful and valuable information extracted by infrared and visible sensors respectively to generate a fused image with richer information and higher quality.Infrared sensors rely on the thermal radiation information of the target object to image,which can adapt to harsh conditions and low illumination environment and highlight the heat source target significantly,but the image resolution is generally low;The visible light sensor images through the reflected light of the object,and the scene texture details are clearly expressed,but it is easy to lose the target due to occlusion and bad weather.Therefore,combining the two methods to highlight the important targets and detailed information in the scene is of great research significance for improving users’ visual perception and high-level semantic tasks,and can be widely used in practical application scenarios such as forest fire prevention,military counter-terrorism,epidemic prevention and control.At present,the traditional fusion algorithm and the fusion algorithm based on deep learning have achieved good results in this field,but on the whole,the images fused by the existing algorithms generally have the problem of being biased towards a certain source image;In addition,the existing fusion algorithm based on GAN netw ork is difficult to get rid of the dependence on content loss,but ignores the important secondary information contained in the source image;Finally,some algorithms introduce a lot of noise in the process of fusion,which makes the fused image extremely inconsistent with human visual perception and serious distortion.Based on the above problems,this thesis has carried out the following research:(1)In order to solve the problem that the fused image is biased and the important secondary information is ignored,this thesis designs a double discriminator GAN network(RDABGan)under information balance to fuse infrared and visible images.Based on GAN network model,the algorithm sets two discriminators to fit the data distribution of infrared and visible images respectively,and keeps the infrared intensity information and visible light gradient information as balanced as possible.At the same time,the multi-scale residual intensive attention block is designed as the basic module to build the network,which enhances the feature flow and information reuse and improves the feature extraction ability.Furthermore,the loss function of main and auxiliary content is constructed,and the gradient auxiliary information in infrared image and the intensity auxiliary information in visible image are fully utilized to enhance the complementary characteristics of features.Compared with the seven algorithms in TNO and RoadScene public data sets,the proposed algorithm achieves better performance and can generate more balanced fused images with richer contrast and texture details.(2)Aiming at the problem that the fusion algorithm is weak in noise introduction and generalization,this thesis designs an infrared and visible image fusion algorithm based on cross attention and Transformer,which is called SwinTFuse.In this algorithm,CNN and Transformer are combined,and a local-global feature extraction module is designed for depth feature extraction.The local feature information of the source image is extracted through dense blocks,and the global context information is extracted through a visual Transformer composed of multihead attention shifting and squeezing-stimulating attention modules.In addition,the attentionoriented cross-domain feature fusion module is introduced to mine and integrate the important information in the single-mode domain and the complementary information between the crossmode domains,which not only avoids information loss and noise introduction in the fusion process,but also maintains the appropriate representation strength of the fused image from a global perspective.The experimental results show that the fused image generated by the designed SwinTFuse algorithm can fit the visual perception of human eyes to a great extent,with rich details and balanced information extraction.At the same time,it has strong model generalization ability,and has achieved excellent performance in both subjective vision and objective evaluation indicators in the comparison of public data sets. |