Video and images contain all the information perceived by visual imaging,whose sharpness and fidelity are of great significance for the subsequent representation,identification,and detection tasks.However,limited by factors such as environmental illumination,quantization error,and equipment constraints,the noisy signals are inevitably introduced to digital images or video in the processes of acquisition,transmission,and storage,which highly limit the accuracy of the following tasks.To remove the noise presented in the images or video,existing works have proposed a variety of denoising methods based on image priors or deep learning.However,these methods mainly focus on obtaining an image or video with higher quality,neglecting the flexibility,efficiency,robustness,and applicability of the denoising model in practical application tasks.Toward practical application scenarios,image and video denoising tasks are still required to address the following difficulties and challenges.First,different scenarios have different requirements for the computational complexity,parameter number,and denoising quality,but a denoising model cannot simultaneously achieve the best performances on all these metrics.Therefore,it has to play a trade-off game.Second,videos contain rich temporal redundancy.Although this redundancy brings much selfsimilarity,it also causes a large amount of data volume.Thus,the denoising model needs to achieve a higher denoising efficiency.Third,videos contain different levels of unknown scene motions,which greatly hinder the utilization of the interframe similarity.As a result,with the increase in motion level,the denoising performance declines sharply.Finally,the differences in the noise distributions cause the pretrained denoising model being unable to be directly applied to practical application scenarios,and the denoising model needs to be retrained or fine-tuned with the data captured from practical scenarios.However,it is extremely difficult to capture the ideal noisy-clear pairs in practical scenarios,and thus causes great difficulty in promising a satisfactory denoising performance.Centering on the above difficulties and challenges,the main research contents and innovations of this dissertation are listed as follows.1.A demand-oriented framework(DOF)is proposed for image denoising.The proposed framework can achieve a trade-off for the parameter number,computational complexity,and denoising quality.Specifically,we design a scale encoder,a split-flow module,and a scale decoder,and introduce a scale factor,a network branch factor,and an information capacity factor,which help the denoising model achieve controllable performance.In the designed framework,the scale encoder helps the denoising model extract fewer but more representative shallow features;the split-flow module extracts deep features and makes the contained information flow across multiple network branches;the scale decoder transforms the estimation of the noise map into the estimation of multiple subnoise maps,through which the proposed model achieves image denoising.Experimental results demonstrate that the proposed framework can be extended to many existing denoising models,and can help the denoising model achieve more competitive performances in terms of the parameter number,computational complexity,and denoising quality.2.A multiframe-to-multiframe network(MMNet)is proposed for video denoising.The proposed network can achieve parallel denoising of multiple frames,and simultaneously optimize the denoised result in both the spatial and temporal dimensions,which helps achieve promising temporal consistency and dramatically improve the denoising efficiency.Specifically,by designing a multiframe-tomultiframe denoising scheme,the denoising model can simultaneously recover multiple frames.Then,a spatiotemporal loss is designed to optimize the denoised result from the spatial and temporal dimensions.Based on this,a multiframe-tomultiframe denoising network is established,which utilizes the spatiotemporal convolutions to fully consider the spatiotemporal redundancy,helping mine the interframe and intraframe similarity,and thus improve the denoising quality.As demonstrated in the experimental results,benefiting from the efficient utilization of the spatiotemporal redundancy and the underlying parallel mechanism of the multiframe-to-multiframe denoising scheme,the proposed method can improve the temporal consistency and denoising efficiency.3.An explicit motion estimation-embedded progressive denoising method is proposed for video denoising.The proposed method couples the video denoising model and motion estimation model,enabling the denoising model to effectively utilize the temporal redundancy and alleviate the influence of unknown motion.Specifically,by establishing a noisy video database involving different motion levels,we provide a comprehensive analysis of the influence of explicit motion estimation and implicit motion learning to the denoising model,through which we observe that explicit motion estimation can help the denoising model achieve more robust denoising performance.Based on this observation,we decouple the denoising model into the processes of spatial denoising,explicit motion estimation-assisted frame reconstruction,and temporal aggregation,removing the video noises in a coarse-to-fine manner.Extensive experimental results demonstrate that the proposed method can achieve more robust performance than existing methods.In particular,the proposed method substantially improves the denoising performances in the case of videos containing large-scale motions.4.A series of denoising application studies toward practical scenarios are conducted.Specifically,we propose a training strategy for the denoising model based on non-ideal paired images,which achieves the supervised training in the case of noisy-clean pairs with inconsistent brightness and misaligned pixels.The proposed training strategy has been applied to the high-speed camera(>10000 fps)developed by our team,which substantially improves the imaging quality.To help the deployment of the denoising model,a model quantization technique is introduced to quantize the values of parameters and activations,which helps transform the 32-bit floating-point number to an 8-bit fixed-point number,and thus greatly reduces the model size and resource consumption.Additionally,the applicability of the proposed denoising model in high-level tasks(e.g.,detection and segmentation)is further explored.By comparing the performances of the high-level task models with and without the denoising model,we demonstrated the effectiveness and necessity of the denoising model in high-level tasks under noisy conditions.To summarize,this work takes the practical application demands as traction,establishing denoising models with higher flexibility,efficiency,and robustness based on convolutional neural networks,which tackle a series of challenges that exist in practical applications.Finally,the denoising models are applied to the high-speed camera imaging noise removal task and noisy image detection,segmentation,and identification tasks,which demonstrate the effectiveness and necessity of the denoising model. |