| With the popularity and application of imaging and display devices,such as smartphones,tablet computers,and digital cameras,digital images are rapidly increasing by billions per hour.The digital image has become an indispensable information carrier in people’s daily life.High-definition images not only can satisfy intuitive visual impressions,but also provide abundant information to acquire high accuracy in cognitive tasks,such as object detection,recognition,segmentation,and diagnosis.However,real-world images are not always clean and high-resolution(HR).On the one hand,due to limitations of imaging devices and systems,unstable environments,and other reasons,the captured images may appear to be noisy,blurry,and low-resolution(LR).On the other hand,subsequent storage and transmission can also result in the decline of visual quality.Hence,real-world image restoration,as one classical but still active low-level vision research topic in the field of digital image processing,not only effectively increases imaging performance and improves visual quality,but also has great significance for improving the accuracy of cognitive tasks.Generally,existing real-world image restoration methods directly learn the mapping from real-world/realistic degraded(even LR)images to the corresponding clear/highresolution images via deep convolution neural networks(CNNs).However,real-world low-quality image is degraded in extremely sophisticated and diverse manners.Hence,it should improve the restoration performance of CNN to handle real-world low-quality images.In general,larger receptive field is helpful to restoration performance by taking more spatial context into account,but most CNNs are trapped by the limited receptive field.Moreover,due to the distribution inconsistency between the limited training data and the unknown real-world low-quality testing images,there remains a fundamental challenge for existing CNN restoration methods to restore real-world low-quality images.To address these issues,this dissertation successively increases the local receptive field safely and introduces the global frequency information to improve the performance of real-world image restoration with the linear complexity of input image resolution.Furthermore,based on efficient networks,we deeply study real-world image degradation modeling to solve the inconsistency of degeneration distribution.The main research contents and contributions are summarized as follows:(1)To handle complex real-world low-quality images,CNN needs a large receptive field to improve restoration performance.Generally,CNN can increase depth,enlarge convolutional kernel size in the space domain,or adopt detailed filter and pooling operations to enlarge the receptive field.However,the accompanying dramatic computational cost,checkerboard effects,and information loss are detrimental to further improving their performance.Inspired by this,we propose a novel multi-level wavelet CNN(MWCNN)model.The core idea is to embed wavelet transform into CNN architecture to safely reduce the resolution of feature maps while increasing the receptive field.MWCNN is a multiscale architecture,and both low-frequency bands and high-frequency bands are taken as the input of CNN blocks for better utilization of time-frequency.The proposed MWCNN can also be viewed as an improvement of dilated filter and a generalization of average pooling,which can be applied to not only image restoration tasks,but also any CNNs requiring a pooling operation.The experimental results demonstrate the effectiveness of the proposed MWCNN for tasks such as image denoising,single image super-resolution(SISR),real-world image denoising,and even object classification.(2)Even though MWCNN can improve the performance of real-world image restoration,the global information is ignored,which is difficult to process global degradation information in the real-world low-quality image.Existing methods that can capture global information suffer huge computational complexity or lack local information.To address these issues,we propose a new method for real-world image restoration,named the spectralspatial filtering Transformer method(SSFT),which can learn both global-local information simultaneously.The key insight of our SSFT is learning both spectral and spatial filters for extracting and enhancing features.And then channel multi-head self-attention(CMSA)is utilized to combine global and local information which is obtained by spectralspatial mixing filters.Furthermore,the Fourier correlation is presented to propose a global-local gated feed-forward network.Because of the high efficiency of FFT and CMSA,the complexity of SSFT is linear with its input resolution.The analyses and experiments demonstrate that compared with other methods,this work can effectively improve the performance of real-world image restoration.(3)Real-world image super-resolution(SR)methods need massive real-world LRHR image pairs for training deep neural networks.However,the ground-truth HR image of real-world LR images is generally unavailable.Hence,this dissertation learns blind SR networks from a realistic,parametric degradation model by considering blurring,noise,downsampling,and even JPEG compression.In contrast to directly reconstructing the HR image in a blind manner,the proposed model adopts a cascaded architecture for noise estimation,blurring estimation,and non-blind SR,which can be jointly end-to-end learned from training data and benefit generalization ability.Experimental results show that the proposed method performs favorably on synthetic and real-world LR images.(4)Collecting specific real-world noise-clean image data pairs or synthesizing noiseclean data pairs can help to achieve promising performance.However,when the distribution of real-world noisy images is unknown,the denoising performance is still limited due to the domain gap between the training set and the testing set.Nonetheless,the unknown noise distribution usually can be modeled as a proper combination of existing noise distributions.In this dissertation,we propose a simple yet effective maximum posteriori estimation deep ensemble(MDE)method for real-world image denoising,where several representative deep denoisers pre-trained with various training data settings can be fused to improve robustness by considering input-dependent aleatoric and epistemic uncertainty.To overcome highly signal-dependent and heterogeneous real-world noises,improved MWCNN is adopted to predict pixel-wise weighting maps to fuse these denoisers.Extensive experiments have shown that real-world noises can be better removed by fusing existing denoisers instead of training a big denoiser with expensive cost.Furthermore,our BDE can be extended to other image restoration tasks,i.e.real-world image deblurring,image deraining,and single image super-resolution. |