| Recently,with the development of short video and live streaming,video data on the Internet grows explosively.Video codecs with high efficiency can lessen the storage and transmission pressure from videos.Existing methods focus on the improvement of objective metrics(e.g.Peak Signal-to-Noise Ratio,PSNR).However,higher PSNR does not always mean better perceptual quality.This thesis optimize the existing video compression methods based on saliency according to the attention mechanism of the human visual system,including traditional video coding and deep video compression.For traditional video coding,this paper improve the present optimization methods from the perspective of the actual application,balancing the computation complexity and compression performance.To our best knowledge,there is no saliency-based perceptual optimization method for deep video compression and this paper explores this research area.Details are as follows:1.This thesis predict the spatial saliency and temporal saliency respectively for the input video and combine them to get the final saliency map.Additionally,this paper apply interprediction in the saliency space to save computation resources.To optimize the traditional video coding,this paper choose the method which selects the quantization parameter according to the saliency map.Experiment results show that the performance of the optimized encoder increases when the decoded videos are measured by EWPSNR(Eye-tracking Weighted PSNR),meanwhile decreases on PSNR metric,indicating that the encoder compresses the video with better perceptual quality at the same bitrate by enhancing the video quality in the salient area and reducing the quality in the non-salient area.Besides,the extra encoding time is acceptable.2.Deep video compression is lack effective rate control methods.This thesis propose a new method to allocate the bits of the prediction residual.Before compressing the residual,this paper filter it with the saliency map to adjust the amplitude of the residual to reallocate the bits of the different areas with different saliency.This thesis design a multi-task saliency prediction model in terms of the trade-off between computation complexity and performance,which shares the shallow convolution layers of the image compressor and video compressor.This paper train the two tasks jointly and share the information between them.The optimized method achieves 1.04%bits reduction in terms of EWPSNR on average when testing in HEVC test sequence(Class B,Class C,Class D,Class E)with no decrease on PSNR metric.Meanwhile,the extra parameters and computation complexity only increase 1.4%and 1.6%.The average bits reduction will achieve 2.81%when filtering residual with the ground truth saliency map. |