| As an important problem in the computer field,video super-resolution algorithms have broad application prospects and research value in many fields such as video inpainting,video surveillance and multimedia entertainment.In recent years,with the rapid development of deep learning technology,a large number of deep learning networks and algorithms have been applied to the field of video super-resolution.Compared with traditional methods,deep learning-based methods can extract inter-frame details more accurately and reconstruct higher quality video sequences.In this thesis,we investigated deep learning-based video super-resolution and proposed Multi-Offset-Flow-based Network for Video Restoration and Enhancement(MOFN).Then,we introduce the Transformer network to our work and further proposed Space-Time Video Super-Resolution with Transformer(STVSRT).The main research contents of this thesis are as follows.In this thesis,a multi-offset optical flow-based network model is proposed to make more efficient use of interframe information by using optical flows with offset diversity.In the research related to video superresolution,many deep learning methods using optical flow or deformable convolution have been applied to video super-resolution in recent years.However,motion estimation based on a single optical flow is difficult to capture more accurate inter-frame information,while methods using deformable convolution lack explicit motion constraints that affect their ability to handle fast motion.In our network,an alignment and compensation module based on multi-offset optical flow is used to estimate optical flow with multiple offsets for adjacent frames and perform frame alignment.The aligned video frames are fed into the fusion module,and reconstruction module to obtain high-resolution video frames.The experimental results demonstrate that MOFN has good ability to deal with motion.Compared with the traditional bicubic interpolation method,the PSNR has an average improvement of more than 3.75db and the SSIM has an average improvement of more than 0.0805 on the test set.Compared with several state-of-the-art methods,ours method also show comparable or better levels.According to the time-space correlation of video sequences,we further introduce the task of video frame interpolation on the basis of video superresolution,and designs an end-to-end network to realize video Space-Time super-resolution.The two-stage method,which consists of video superresolution algorithm and video frame interpolation algorithm,cannot make full use of the spatial and temporal information of the video sequence,and also brings a certain amount of redundant calculation.Therefore,we adopt an intermediate feature synthesis module based on multi-offset optical flow to synthesize feature frames.Further inter-frame feature fusion is achieved by a local feature fusion module and a global feature fusion module based on Transformer.The video sequences after feature synthesis and fusion are passed through the reconstruction module to obtain high-resolution and high-frame rate video sequences.The experiments verify that Space-Time Video Super-Resolution with Transformer is more competitive in processing video sequences with largescale motion.The experimental results show that,compared with the best performing two-stage method in the test set,our method has an average improvement of more than 0.31 db in PSNR and an average improvement of more than 0.0055 in SSIM.Compared with other Space-Time superresolution methods,STVSRT also has improvement in PSNR and SSIM. |