| With the advancement of artificial intelligence,digital photography technology has evolved tremendously.Generating a fully focused fused image from a set of images of partially focused objects captured by an optical lens is the goal of the Multi-Focus Image Fusion(MFIF)task.Using improved MFIF algorithms to extend the limits of optical lens depth of field and expand the maximum distance between objects in the complete image that can be captured,the study of this task is not only important for the field of MFIF,but also can greatly contribute to other fields such as optical microscopy,integral imaging,and image segmentation.Compared with unfocused blurred images,human visual perception will be better when observing panoramic clear images.Due to the limitation of the current lens depth of field or lens shake,it leads to multiple targets in the captured images not being in focus at the same time,and the amount of information contained in these out-of-focus regions is sparse,and if these missing information is not recovered,it will lead to a poor visual effect of the final fused image,which is not conducive to the subsequent image processing tasks.In the thesis,the existing multi-focus image fusion algorithm is improved,and the specific tasks accomplished are as follows:(1)Summarize the existing research work on MFIF algorithms,analyze the applicability environment and shortcomings of existing algorithms.To study what existing multi-focus image fusion algorithms can learn from,to consider the key points of existing algorithms that can contribute to the development of MFIF field,and to briefly introduce the impact of previous work on the work of our work.(2)In order to enhance the reliability of the research in this thesis,two different network architectures based on deep learning are proposed.The multi-scale twin-branch network based on attention mechanism alternately uses pyramidal attention module and multiscale convolution module,implements different channel attention,and performs weighting operation on different channels to obtain rich multi-scale semantic information,which ensures the final overall visual effect.The improved pseudo-twin network based on attention mechanism,each branch alternates five stages to extract feature maps,preserves as much as possible multi-focus The feature fusion stage uses consecutive residual blocks to fuse the feature maps obtained in the first stage,and the attention mechanism of ECA channels with shared weights is introduced throughout the process to maintain the number of channels in the source image extraction process.The practicality and constructiveness of the proposed method is demonstrated from multiple perspectives,and a certain foundation is laid for the subsequent work.(3)The Transformer mechanism,which is effective in downstream tasks,is introduced to focus on long-distance features in multi-focused images,balance the local features focused by the attention mechanism,and ensure that the information on fine-grained space is further enhanced to enrich the overall information of the fused images.In order to alleviate the MFIF method based on the attention mechanism to focus excessively on part of the detailed information in the source image,a method based on the combination of Transformer and CNN is proposed to fuse Transformer blocks and Conv Ne Xt blocks together to form the encoder part of the proposed framework,which captures the global and local structure information of the features using the long-term attention and hybrid weighting mechanisms of these two mechanisms,respectively and local structural information to achieve effective hierarchical fusion of local and long-range information.In order to examine the performance of the proposed method,a large number of ablation experiments and comparison experiments are conducted to introduce various metrics to measure the accuracy of the method from an objective perspective,while the generalization of the proposed method is measured based on several test sets,and the comprehensive performance of the Transformer-based method combined with CNN is excellent from a comprehensive perspective. |