| Image fusion refers to extracting the most meaningful information from different source images and combining them to generate a single image with more information and more beneficial to subsequent applications.It has been widely used in military and livelihood applications because it breaks through the hardware limitations of a single sensor.In the past decades,image fusion is mainly carried out by multi-scale transformation,sparse representation,and other methods,and satisfactory results have been achieved.However,due to the introduction of handcraft rules,these fusion methods are limited to specific tasks and have poor generalization performance.On the contrary,the image fusion method based on deep learning can effectively overcome the influence of handcraft rules by optimizing itself in the way of backpropagation.Compared with the traditional method,it has the advantages of strong adaptability,fault tolerance,and antinoise ability,and has become a research hotspot.Common deep learning-based image fusion method network architectures mainly include Convolutional Neural Network(CNN)and Generative Adversarial Network(GAN).The GAN-based method mainly improves the quality of the fusion image generated by the generator through the antagonistic game between the generator and the discriminator.The antagonistic mechanism makes the generation of the fusion image more consistent with the feature distribution of the source image.However,the instability of its training also makes the network training difficult to estimate.On the contrary,the CNN-based method usually adopts a well-designed loss function and network framework to achieve feature extraction and fusion,so that the fusion image can effectively retain the feature information from the source image under the condition of ensuring the stability of training.Due to the excellent performance of deep learning-based methods in fusion performance and generalization,the research content of this paper mainly focuses on deep learning-based multi-focus image fusion method and multi-modal image fusion method,which can be summarized as follows:1)For multi-focus image fusion,a multi-focus image fusion method based on Nest network and dilated convolution is proposed in this thesis.To address the problems of the existing mainstream deep learning-based fusion methods,i.e.,the network design does not sufficiently consider the correlation between source images leading to incomplete information extraction and the lack of strong constraints on the loss function,a fusion network and a strong loss function are designed for focused and unfocused characteristics.In this network,the well-designed encoder extracts detailed focusing features from the source image,while the decoder aggregates these features more efficiently and outputs accurate focusing probabilities to generate the final fused image.A hybrid loss designed for its specific purpose constrains the training by fidelity loss,structural loss and pixel loss,so that the network can distinguish focus and defocus region more accurately,and generate high-quality fusion images.2)For multi-modal image fusion,a multi-modal image fusion method based on saliency detail constraints is proposed in this thesis.Considering that multi-modal image fusion lacks ground truth,unsupervised training methods usually produce information redundancy,which leads to bias in the fused images in terms of luminance and detail terms.Therefore,saliency constraints are introduced in this thesis to force the network to learn the saliency features of the source image during the training.Meanwhile,to reduce information loss in the process of network feature transmission,a progressive fusion network is designed,in which the self-reinforcing attention module constructed by the source image can effectively refine features and reduce information redundancy.Under the mutual constraints of these designs,the generated fusion image can more effectively reflect the saliency information of the source image.The proposed methods are tested on open-source datasets in their corresponding domains,and the superiority of the proposed methods is demonstrated by comparing them with the current state-of-the-art methods. |