| With the development of multimedia technology in recent years,the data volume of high-definition and ultra-high-definition video has grown exponentially.Data transmission and storage are under enormous pressure,and video coding technology is facing greater challenges.As the latest two generations of international coding standards,High Efficiency Video Coding(HEVC/H.265)and Versatile Video Coding(VVC/H.266)were released in January 2013 and July 2020,respectively.HEVC and VVC achieve about 50%and 30%compression performance improvement compared to their predecessors.The improvement in coding performance is accompanied by a dramatic increase in coding complexity.Block partitioning is the most time-consuming technology in the encoding process of the reference encoder,which determines the partition structure of coding tree units of the frame by recursive traversal search.HEVC standard adopts an adaptive quadtree partition structure,replacing the uniform block partitioning adopted by the former AVC.VVC standard adopts a quadtree plus multi-type tree(QTMTT)structure,which supports binary tree and ternary tree in two directions besides quadtree.To make HEVC and VVC standards applicable to actual coding scenarios,it is necessary to speed up the block partitioning to reduce encoding complexity.Aiming at the design of the low-complexity block partitioning method,this dissertation summarizes two key issues:one is a concise and complete data form to characterize the partition;the other is an efficient partition prediction method.In response to these two key issues,this dissertation proposes to use two-dimensional and threedimensional matrices to characterize the quadtree structure and the QTMTT structure,named depth map and partition map respectively.Besides,this dissertation uses deep learning-based methods to predict the partition structure,completely or partially replacing the partition search of the encoder to accelerate the encoding process.HEVC and VVC have different partition structures and characteristics,thus this dissertation consists of two parts.The first part is the quadtree-based partitioning method for HEVC,and the second part is the QTMTT-based partitioning method for VVC.The QTMTT structure includes the quadtree structure,so the work of the first part is also the basis of the second part,and the feasibility of the method is verified for the second part.The main contributions of this dissertation are listed as follows:1.This dissertation proposes a depth map prediction-based HEVC partitioning method.This dissertation proposes to use a depth map to represent the quadtree partition structure adopted by the HEVC standard.The depth map is a twodimensional matrix,each element of which represents the partition depth of the corresponding position of the coding tree unit.Furthermore,this dissertation designs a convolutional neural network that takes the pixel values of an image block as input and predicts its corresponding depth map.The multi-scale pooling layers of the network and the multi-scale L1 loss function used for training are designed to adapt to the inherent properties of the depth map.The depth map prediction can decide the entire quadtree partition structure of the coding tree unit,skipping the recursive partition search process of the encoder.The depth map prediction method transforms a series of classification problems into a regression problem of extracting texture characteristics,and achieves significant encoding speedup on the standard test sequences with little compression performance loss.2.This dissertation proposes a partition map prediction-based VVC partitioning method.The partition structure of VVC is upgraded to the QTMTT structure,which is far more complicated compared with the quadtree structure of HEVC.Superficially,the partition structure becomes more irregular.Fundamentally,the mapping from pixel values to partition structure becomes very complicated.Aiming at the superficial problem,this dissertation proposes to use the partition map to represent the QTMTT structure on the basis of the depth map.The partition map is a three-dimensional matrix,using different types of depth maps and direction maps to constitute a complete and regular representation of the QTMTT structure.In the physical sense,it reflects the texture characteristics of the image at different scales.Aiming at the fundamental problem,this dissertation designs a convolutional neural network to predict the partition map,which emulates the partition search process of the encoder,and designs a top-down post-processing algorithm to refine network output and extract partition decisions.Partition map prediction and post-processing can determine the partial or entire QTMTT partition structure of a coding tree unit,enabling an adjustable "encoding speedup-encoding performance loss" trade-off. |