| With the rapid development of the Internet,video data has increased dramatically,which has put a strain on storage and transmission.Thus,it’s urgent to compress video efficiently.Video data mainly contains temporal redundancy,and inter prediction is the core tool to remove temporal redundancy.Therefore,how to efficiently perform inter prediction is the key to improving video coding efficiency.In the past years,traditional inter prediction technology has made significant progress,but now it has gradually encountered performance bottlenecks.There are two main reasons.Firstly,the traditional inter prediction module is hand-crafted and hand-optimized,so its predictive ability is quite limited.Secondly,the traditional inter prediction module is usually optimized separately,which makes it difficult to achieve globally optimal performance.In order to break through the bottlenecks,there needs some new tools and new ideas urgently.In recent years,deep learning has achieved great success in image and video processing.Deep networks not only can fit complex mappings,but also can perform joint optimization.It can be seen that deep learning can make up for the limitation of traditional inter prediction.This dissertation is aimed to study how to use deep learning to solve the problem of inter prediction.This dissertation proposes a three-step research idea for inter prediction,which is using deep learning to enhance first,then generate,and finally combine with residual coding by joint optimization.Firstly,for the motion regions that traditional inter prediction can handle,such as translational motion,this dissertation proposes to train deep networks to enhance the traditional prediction signals to further improve the accuracy.Secondly,for the motion regions that traditional inter prediction is hard to deal with,such as complex motion,this dissertation proposes to train deep networks to characterize complex motion and then directly generate prediction signals.Finally,this dissertation proposes to use deep networks to combine inter prediction and residual coding.Accordingly,the framework can perform joint optimization and then achieve better performance.In addition to the technical research,this dissertation also summarizes and analyzes inter-prediction coding technologies from the perspective of optimization,and further discusses the development potential.The main works and contributions of the dissertation are as follows:(1)As for inter prediction enhancement,this dissertation proposes a deep networkbased motion compensation enhancement method.First,this dissertation proposes a simple motion compensation refinement scheme that mainly exploits the temporal correlation.It uses the trained convolutional neural network to enhance the motioncompensated prediction directly to make the prediction signal close to the original.Second,this dissertation considers exploiting not only temporal correlation but also spatial correlation,and then proposes an advanced motion compensation refinement scheme.It utilizes the neighboring reconstructed region to enhance the prediction further.Finally,both schemes are integrated into the video coding framework,and experimental results demonstrate their effectiveness.(2)As for inter-prediction generation,this dissertation proposes a deep networkbased frame-extrapolation prediction method with reference frame alignment.This dissertation proposes to align the reference frames,e.g.using motion estimation and motion compensation,and then to extrapolate from the aligned frames by a trained deep network.The alignment can effectively remove the translational motion between reference frames,and makes the network focus on characterizing the remained high-order motion.The proposed method is integrated into the coding framework.Experimental results demonstrate that the proposed method can handle complex motion effectively and improve coding performance.(3)As for inter prediction combined with residual coding,this dissertation proposes a hybrid optimized video coding method based on the deep network and the mode decision.The entire framework(including inter prediction and residual coding)is on top of deep networks,and also adopts multiple inter-prediction modes.At the offline stage,the proposed method jointly trains the networks through a large number of samples,and then obtains a pre-trained model.At the online stage,for each coded video,the proposed method adaptively searches for the best mode and optimizes the mode’s parameters by the numerical algorithm.Experimental results show that the proposed framework achieves comparable performance to HEVC(HM),and outperforms the pure deep learning-based end-to-end video coding methods.(4)This dissertation also rethinks inter-prediction coding.Video coding is a mathematical optimization of rate and distortion essentially.This dissertation analyzes coding from the perspective of optimization and then finds that the inter-prediction coding in the traditional framework represents the discrete optimization solution and the inter-prediction coding in the learned framework represents the continuous optimization solution.Based on the analysis,this dissertation proposes an advanced strategy of hybrid-optimization coding,which regards inter-prediction coding as a hybrid of discrete and continuous optimization problems.Both search and numerical algorithms are used to solve the problem.The analysis and thinking in this dissertation not only explain the existing inter-prediction coding methods theoretically,but also provide a new idea for future deep learning-based inter-prediction coding. |