In order to acquire the images of multiple different modes of the target object,it is generally necessary to use different technical means or equipment.In fact,some modal images are common and easy to obtain,while other modal images,due to the limitation of equipment and cost,are difficult to obtain.Therefore,to realize the conversion of images into different models,i.e.,cross-modal image generation,is of great practical importance and has been highly valued by researchers.Most of the image cross-modal generation algorithms are based on paired training data.Many of these algorithms can achieve good modal conversion results after sufficient training,but in some cases,the paired data are not sufficient for effective training.At present,the Cyclic Generative Adversarial Network(Cycle GAN)works well in realizing cross-modal image generation based on unpaired training data,but still suffers from blurred image details and distorted edge structure.This thesis investigates and improves its network structure,so as to solve the above problems by increasing the structural information and detail processing ability.A generative network model with increased gradient branching is proposed to address the problems of regional edge contour defocusing and detail distortion caused by the lack of image modality-specific information in cross-modal image generation.Additional structural prior information is provided through modal gradient images to help the generative network focus more on the preservation of structural configurations and improve the cross-modal generation effect of the image.Two Markovian discriminators,namely Patch GAN,are used as discriminators for adversarial generation,while two additional gradient discriminators are used to supervise the results of image cross-modal generation through adversarial learning using the detail processing capability of Patch GAN for small chunks of gradient image regions with sufficient attention.This thesis also takes advantage of Patch GAN’s ability to deal with details so as to pay full attention to small areas and blocks of gradient images,and distinguish whether the gradient patch comes from the gradient map of the input image,and finally supervise the results of cross-modal image generation through adversarial learning.Moreover,a pixel-level gradient loss-based penalty for gradient image differences is designed to improve the perceptual quality of the model,and the image space and gradient space is used to jointly supervise the model to capture structural details.This thesis adopts the student face dataset in the public dataset CUHK and the horse2 zebra dataset,as well as the collected visible light and infrared images,color person optical imagesperson grayscale images to conduct multi-modal cross-modal generation experiments.Peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)are used as evaluation indicators.On this basis an image cross-modal generation software has been implemented which is capable of training a model based on user input to achieve cross-modal generation for a given modality.Comparing the method in this thesis with Cycle GAN,visually,the image cross-modal generation effect of the improved model in this paper is more realistic,the details of the image content are clearer,and the geometric shape changes are closer to the expected target.The model improved by the method in this thesis has higher PSNR and SSIM metrics than the Cycle GAN model. |