Font Size: a A A

Research On Multi-beam Satellite Dynamic Beam Hopping Algorithm Based On Multi-Objective Deep Reinforcement Learning

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2518306308975739Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Satellite communication systems have extensive coverage and strong communication capabilities and have received widespread attention as communication technologies that supplement the lack of ground mobile communication systems.When considering the inherent uncertainty of differentiated services and the non-uniform spatial distribution of capacity requests,the satellite resources must be flexibly adjusted to meet different needs.Among them,multi-beam satellites achieve the purpose of covering a larger area by scheduling multiple high-gain narrow beams,which can effectively reduce the satellite's payload and improve the performance of the satellite system.It plays a vital role in improving communication quality and data transmission rate,so it has received widespread attention from researchers.How to match the system capacity requirements with the effective use of the beam is a new challenge.The conventional beam hopping method ignores the inherent correlation between decisions,does not consider long-term returns,and can only obtain the optimal solution at the current time.In addition,as the demand for differentiated service classification and the number of beams increase,the system computational complexity increases significantly.In order to meet these challenges,this paper studies the optimal strategy for DVB-S2X multi-beam satellite beam hopping and conducts in-depth research on the following two issues.First,the traditional dynamic beam hopping algorithm only considers the optimal solution at the current moment and ignores the inherent correlation of beam hopping.Reinforcement learning has the characteristics of sequential decision making and can meet the conditions of satellite scene dynamic unknown.The research laboratory has proposed a dynamic beam hopping algorithm based on deep reinforcement learning,but only considers the delay optimization of a single service.In the case of wireless channels,a dynamic beam hopping algorithm based on multi-objective deep reinforcement learning is proposed for non-global optimization problems caused by the randomness of differentiated service requirements in high-throughput communication satellites.This algorithm can achieve the goal of collaborative optimization of multiple targets in scenarios with different QoS service types.The evaluation results obtained under the simulation of DVB-S2X satellites show that the algorithm proposed in this paper can intelligently allocate resources to meet user needs and channel conditions.Compared with the existing algorithms,the algorithm in this paper can realize the fairness of the delay between cells under the condition that the service arrival rate is extremely unbalanced,and can guarantee the minimum average transmission delay of real-time services and the maximum throughput of non-real-time service transmission.Secondly,in the traditional deep Q learning networks,the problem of large action space is usually faced.Among the algorithms proposed by the research laboratory,the action selection method can only select the suboptimal solution.Aiming at the disaster problem of beam selection dimension in the satellite beam hopping process,a time-division multi-action selection method based on double-loop learning is proposed,which can realize the dynamic selection of multiple beams to meet user needs.Through theoretical calculations,this method has greatly reduced the computational complexity in large-scale beam scenarios.
Keywords/Search Tags:multi-beam satellite, dynamic beam hopping, differentiated services, multi-objective deep reinforcement learning, time-division action selection
PDF Full Text Request
Related items