| Multi source image fusion utilizes the superior information of multiple source images to complement each other to obtain a fusion image with more comprehensive information and better visual effect.An important application of multi-source image fusion is the fusion of infrared and visible images.Aiming at the problems of less detail,unclear,and low contrast in fusion images using depth learning methods in recent years,this thesis applies the ideas of residual connection,dense connection,attention mechanism,and multiscale features to depth learning methods,and proposes three infrared and visible image fusion methods based on depth learning.(1)Aiming at the problem of insufficient and unclear fusion image details in existing methods,a two-stage self encoder model for infrared and visible image fusion based on residual network and attention mechanism is proposed.Using a residual network in the encoder improves the depth of the model and enables the encoder to extract more source image features.Adding a CBAM module to the encoder tail further increases the difference between features.In the fusion layer,a generalized pooling fusion strategy is adopted,combining the advantages of average pooling and maximum pooling,to adaptively allocate weights in the channel domain of the encoder’s output characteristic graph.Experimental results show that this method can improve the richness and clarity of fusion image details.(2)Aiming at the problems of insufficient salient target features and low contrast in the fusion image using the aforementioned method,and in order to further preserve the significant targets with larger sizes and continuous textures in a larger range of the source image,based on the aforementioned method,this thesis proposes a two-stage self encoder model for infrared and visible image fusion based on Conv Ne Xt.In the encoder,Conv Ne Xt is used to replace convolutional blocks to extract the interrelationships between features in a larger range,further enriching the details of the fused image.At the fusion layer,the L1 Norm fusion strategy is used to calculate weights in the spatial domain of the encoder’s output feature map,improving the contrast of the fused image,and highlighting prominent targets.Experimental results show that this method can preserve larger size salient targets and continuous textures,and improve the contrast of the fused image.(3)In order to solve the problem that the aforementioned two models have different feature extraction capabilities at different scales,further highlight the salient features of the fused image,and reduce the interference of high gray values in the fused image,a two-stage self encoder model for infrared and visible image fusion based on an improved U-shaped network is proposed on the basis of the aforementioned two models.Construct a three-layer U-shaped network to form a multi-scale self encoder structure.According to the characteristics of infrared and visible image fusion,a dense connected network with CBAM is used in the first layer of the U-shaped network to extract low-level texture details of the source image.The Conv Ne Xt network is used in the second and third layers of the U-shaped network to extract high-level semantic features of the source image.In the fusion layer,based on the L1 Norm fusion strategy,a grayscale control factor is introduced to improve the grayscale of significant targets and reduce interference from high grayscale values.Experimental results show that this method can preserve the texture of different scales of the source image,highlight the salient features of the fused image,and reduce interference with high gray values. |