| In recent years,with the vigorous development of smart terminals and wireless Internet technology,mobile video traffic will account for the vast majority of global mobile data traffic.Users' requirements for the quality of experience(QoE)of streaming media services are getting higher.However,video transmission with high QoE will consume more bandwidth resources,making wireless bandwidth resources more scarce.With the rise of edge computing,it has become feasible to jointly consider bandwidth resource allocation algorithms and Adaptive Bitrate(ABR)algorithms in a wireless video transmission system to optimize the overall QoE of multiple users from a global perspective.However,the joint decision-making algorithm of wireless bandwidth resource allocation and adaptive bitrate still faces many challenges.There are a large number of users in the actual scenario,and the joint algorithm needs to perform resource allocation and ABR decision for each user,which will make the decision space very large.Moreover,the joint decision-making problem is a complex non-convex optimization problem,which is difficult to find the optimal solution using the optimization method.In addition,traditional methods based on fixed strategies rely on modeling and prior knowledge of the environment and are difficult to be generalized to different network environments.Therefore,this paper takes the overall QoE optimization of multi-users in a wireless video transmission system as the research point.Besides,taking the design of a bandwidth allocation and ABR joint algorithm with high-performance and high-versality as the starting point,this paper proposes a general algorithm framework based on machine learning to implement the joint decision-making algorithm.Specifically,this paper proposes a joint algorithm of Quality of Service(QoS)control and ABR,which is based on single-agent Actor-Critic hierarchical deep reinforcement learning(QOS & ABR).Besides,a joint algorithm of bandwidth allocation and ABR,which is based on multi-agent Actor-Critic hierarchical deep reinforcement learning(Bandwidth Allocation and ABR,MAC-BA & ABR)is proposed in this paper.The main innovations of this paper are as follows:1.Due to the asynchronous characteristic of bandwidth resource allocation and ABR decision,this paper first proposes a hierarchical decision-making algorithm based on single-agent Actor-Critic deep reinforcement learning.The hierarchical decision-making algorithm is composed of a bandwidth allocation decision-making network and a bitrate decision-making network.Each decision-making network is composed of an Actor-Critic network.The Actor network uses the information available in the online environment such as channel quality,client player status,etc.to make decisions.The Critic network evaluates the Actor network's decisions by utilizing the QoE fed back by the environment as a reward and achieves online update of the strategy without the prior knowledge of the environment model.Experimental results prove that the proposed algorithm achieves a significant performance improvement on different QoE standards compared to existing methods.In addition,the proposed algorithm has good versality,which can be applied to strange environments with different QoE standards and channel quality dynamics by only adjusting the network structure and the reward function.2.To adapt to the scenario with a large number of users,this paper proposes the MAC-BA&ABR algorithm by improving the bandwidth resource allocation network based on the traditional single-agent Actor-Critic structure into a multi-agent Actor-Critic structure.The Actor networks in MAC-BA&ABR only need to observe the user's local environmental status to make the bandwidth allocation decisions.And the Critic network evaluates the performance of each actor network by observing the global environmental information,thereby ensuring the resource allocation from the perspective of optimizing the overall QoE.The design of the multi-agent network reduces the increase in network parameters and changes in the network structure when the number of users increases,further improving the scalability of the algorithm.3.To improve the adaptability and predictability of the joint decision-making algorithm to the environment with severe channel quality fluctuations,this paper uses Long Short-Term Memory(LSTM)to predict the channel quality.LSTM can predict the change of channel quality accurately by learning the historical value of channel quality.Experimental results show that the joint decision-making algorithm combined with LSTM has better performances in optimizing the overall QoE of multi-users than the algorithm without LSTM. |