Font Size: a A A

Research On Recursive Models And Deep Learning Methods For Video Frame Rate Up-Conversion

Posted on:2021-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B BaoFull Text:PDF
GTID:1488306503982359Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of high-quality display devices,the demand on high-quality video sources,including spatially and temporally high-resolution data,has been much more urgent than ever.However,limited by the high computational cost and bandwidth consumption during video acquisition,compression,and transmission processes,a practical solution is to transform existing low-quality videos into high-quality ones through digital signal processing technology.Among the research on video quality enhancement,super-resolving videos in the temporal domain,namely video frame rate up-conversion,is the most challenging task and also the fundamental approach of delivering immersive visual experiences to users.Specifically,video frame up-conversion aims to interpolate additional transitional frames between the original low-frame-rate(such as 30Hz)videos to obtain high-frame-rate(such as 60 Hz or even 120Hz)ones.The interpolated frames make the object movements in the videos more exquisite and the transition of frame contents more smooth,thus significantly improving the visual quality for users.The two fundamental stages of video frame rate up-conversion are motion estimation and motion compensation.The motion estimation also referred to as optical flow estimation,targets at determining the two-dimensional motion vectors or say flow vectors of objects between video frames.Since motion estimation behaves as a pre-step of motioncompensated frame interpolation,its generated flow vectors deeply affect the quality of the final frame quality.To this end,this paper proposes a Kalman filter based video optical flow estimation.Within the Kalman filtering system,our method takes advantage of the temporal correlations and spatial contexts of moving objects to perform optimal estimation for sequentially consecutive video frames.In this system,pixel’s motion flow is first formulated as a second-order time-variant state vector and then optimally estimated according to the measurement and system noise levels within the system by maximum a posteriori criteria.The experimental results and analyses on the MPI Sintel,Monkaa,and Driving video datasets demonstrate that the proposed method performs favorably against the state-of-the-art approaches.The generated optical flow fields of videos exhibit better temporal consistency,setting a solid foundation for the following video frame rate up-conversion task.Next,this paper proposes a new video frame rate up-conversion method based on high order model and dynamic filtering.The new model avoids the dependence on brightness constancy and motion linearity assumptions in traditional methods,which enables our frame interpolation algorithms adaptive to challenging scenarios with complex light conditions and non-linear motions.The intensity and position of video pixels are both modeled with high order polynomials in terms of time,which turns the key problem of our method to estimate the polynomial coefficients that represent the pixel’s intensity variation,velocity,and acceleration.We propose to solve these polynomials with two energy objectives: one minimizes the auto-regressive prediction error of intensity variation by its past samples,and the other minimizes video frame’s reconstruction error along the motion trajectory.To efficiently address the optimization problem for these coefficients,we propose the dynamic filtering solution inspired by videos’ temporal coherence.The optimal estimation for these coefficients is reformulated into a dynamic fusion of the prior estimate from pixel’s temporal predecessor and the maximum likelihood estimate from current new observation.Extensive experiments on the natural and synthesized videos demonstrate the superiority of our method over the state-of-the-art methods in both subjective and objective comparisons.Then,with the extraordinary performance of neural network-based deep learning tools on automatically extracting useful features,this paper integrates the deep learning networks to the classical motion estimation and motion compensation framework.We,for the first time,build up a motion estimation and motion compensation driven neural network,which we call as MEMC-Net.A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels.This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly.The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features.Compared to existing methods,our approach is computationally efficient and able to generate more visually appealing results.Furthermore,the proposed MEMC-Net architecture can be seamlessly adapted to several video enhancement tasks,e.g.,super-resolution,denoising,and deblocking.Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets.Finally,on the foundation of MEMC-Net,we for the first time,propose a scene depth aware video frame up-conversion algorithm.This work explores the use of depth information in video frame interpolation and sheds light on the new research direction.In this work,we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information.Specifically,we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones.In addition,we learn hierarchical features to gather contextual information from neighboring pixels.The proposed model then warps the input frames,depth maps,and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame.Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.In summary,this dissertation focuses on the video frame rate upconversion problem.Aiming at its two core sub-problems,including motion estimation and motion compensation,I propose the Kalman filtering model for video optical flow estimation and the superior high order model and dynamic filtering methods for frame interpolation.Based on the recent development of deep learning algorithms,this dissertation,for the first time,builds a MEMC model-driven neural network framework.Furthermore,I introduce the scene depth awareness to video frame interpolation,which is also the first work among existing literature.The new algorithms above have been validated with a wide range of datasets and perform favorably against conventional and state-of-the-art methods.
Keywords/Search Tags:Frame Rate Up Conversion, Recursive Modeling, Deep Learning, Optical Flow Estimation, Scene Depth Estimation
PDF Full Text Request
Related items