| Video frame interpolation is one of the important technologies for computer vision such as video enhancement,video compression,and video restoration.Video frame interpolation increases the frame rate of video sequences by calculating the intermediate frames between original video consecutive frames,it can provide smooth and clear motion for videos to ensure their fluency while reducing motion blur in generated videos.With the development of deep learning,a variety of video frame interpolation methods based on deep learning have been proposed.However,they still need to face great challenges in how to deal with large-scale motion,complex texture details,and large-scale occlusion in video scenes.This thesis systematically studies video frame interpolation methods based on deep learning.For video scenes that are difficult to solve in the process of frame interpolation,such as complex background,fine texture,large-scale motion,complex small motion,and large-scale occlusion,two video frame generation methods are proposed.This thesis first proposes a video frame interpolation method based on deep over-parameterized recurrent residual convolution;in addition,in view of the problem that the parameters of the current video frame interpolation model based on deep learning are too large.A lightweight video frame interpolation method based on multiple lightweight convolutional units and three-scale codec is proposed;at the same time,based on the above methods,the video frame interpolation system is designed and constructed,which lays the foundation for the application of video frame generation algorithms.The specific research contents are as follows:(1)A video frame interpolation method based on deep over-parameterized recurrent residual convolution is proposed,a U-Net architecture feature extractor is designed based on the recurrent convolution layer and residual block,and the video frame interpolation model is built on this basis.First,the U-Net architecture feature extractor is used to extract the input frame features;then,the input frames are warped by deformable convolution to obtain the warped frames;finally,we introduce and improve a frame synthesis network,based on which the features of warped frames are fused,it can generate high-quality frames.Meanwhile,we optimize the model with deep over-parameterized convolutions during training,which further improves the quality of generated frames.Experimental results show that the method is superior to the existing methods in both objective and subjective evaluations on Middlebury,DAVIS,and UCF101 test sets.(2)A lightweight video frame interpolation method based on multiple lightweight convolutional units and a three-scale codec structure is proposed to solve the problem that the existing video frame interpolation models based on deep learning have too many parameters.First,we design a three-scale encoding structure with a two-level attention cascade for learning non-uniform motion information in videos;in the model we introduce recurrent convolutional layers and residual blocks for recurrent residual convolutional unit,thus optimizing the threescale structure in the video frame interpolation model;on the other hand,we improve the quality of the generated frames by fusing the three-scale warped frames through the synthesis network;finally,combining the depthwise separable convolution and the recurrent residual convolutional units,We design a lightweight convolutional unit,and use the local lightweight idea to significantly reduce the amount of model parameters.Experimental results show that our proposed model significantly reduces the amount of parameters on the one hand,and on the other hand,its generated frames have been significantly improved in both objective and subjective quality assessments.(3)Based on the proposed two video frame interpolation methods,the video frame interpolation system is designed and constructed in this thesis.The system mainly consists of three modules,the first two modules are designed based on the two methods proposed in this thesis,and the subjective and objective results of different variant networks are displayed and saved;the third module performs frame rate enhancement based on the two methods proposed in this thesis.This module compares and displays the original video and new video on the interface and saves them. |