Font Size: a A A

Research On Rate Control Algorithm Based On Deep Reinforcement Learning

Posted on:2024-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:H ZengFull Text:PDF
GTID:2568307109455304Subject:Computer technology
Abstract/Summary:PDF Full Text Request
VVC(Versatile Video Coding)is the latest coding standard after HEVC(High Efficiency Video Coding),which integrates the most advanced video compression technology.VVC still uses a hybrid coding framework,and can improve the video compression rate by nearly 50% under the same coding conditions.However,VVC also introduces more complex block division and many inter-frame prediction methods,which not only improves the overall coding complexity,but also brings a great challenge to inter-frame rate control.Since rate control has not changed significantly in the VVC standard,the dynamic planning method based on the R-λ model proposed in the HEVC standard is still used,which has made it difficult to solve the complex rate distortion-dependent optimization problem.Reinforcement learning is a method for solving decision optimization problems through interactive learning strategies,with greater adaptive capacity to complex environments Therefore,this paper proposes to replace the traditional model approach with a reinforcement learning model approach for inter-frame rate control coding.The main research contents of this paper are as follows:(1)This paper combs the research status of rate control at home and abroad,and describes various new technologies proposed by VVC standard,including coding unit block division,intra-frame prediction technology,inter-frame prediction technology,change and quantization,entropy coding,loop filtering technology,and finally summarizes the classic change of the R-λ model from HEVC standard to VVC standard.(2)A method for initial frame quantization parameter(QP)decision based on reinforcement learning Q_learning is proposed.The initial frame QP decision belongs to intra-frame optimization,which is influenced by I-frame QP in lowlatency configuration,and the reasonable determination of initial QP is the key to balance the overall video sequence.The multiple coding algorithm Fixed-QP has high complexity and cannot be applied practically,so this paper converts the rate distortion optimization problem into a decision optimization problem and solves it by Q_learning method.The experiments show that the average BD-rate difference between the rate distortion performance and that of the high-performing Fixed-QP method is only about 1.46%,and the difficult problem of high computational complexity of Fixed-QP is overcome.(3)A DQN-based approach is proposed to replace the R-λ model approach for inter-frame QP decision making.The inter-frame prediction makes the time-domain dependent performance enhanced,so this paper balances the rate distortion performance of the predicted frames by a deep reinforcement learning approach.Experiments show that the proposed DQN-based algorithm achieves an average BDrate(Bj?ntegaard Delta rate)gain of-0.445% and an average BD-PSNR(Bj?ntegaard Delta psnr)gain of 0.021 d B in the low-delay P configuration compared to rate control algorithm in the VVC standard reference software(VTM13.0).
Keywords/Search Tags:Versatile Video Coding, Reinforcement Learning, Rate control, Deep Reinforcement Learning, Rate-distortion optimization
PDF Full Text Request
Related items