Application Of Deep Reinforcement Learning In Path Planning Of UAV Lift Off Station Platform

Posted on:2022-10-03

Degree:Master

Type:Thesis

Country:China

Candidate:S M Yang

Full Text:PDF

GTID:2492306731498004

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the completion of 5G standardization and the rapid development of UAV technology,it has become a reality to use UAV lift-off platforms equipped with base stations to provide ground users with wireless communications.Currently,traditional ground-based fixed base stations have problems such as high deployment costs,inflexible service provision,and poor disaster resilience.The UAV lift-off platform,with its high cost performance and high maneuverability,makes up for the defeat of ground base stations.Since the location and speed of the UAV have a great impact on the channel environment and signal quality,the path planning of the UAV lift-off platform is one of the key technical issues that urgently need to be resolved.This paper focuses on the use of deep reinforcement learning algorithms to enable UAV lift-off platforms to perform real-time path planning based on environmental information and the status information of mobile users on the ground,and provide stable and high-quality communication connections for multiple users on the ground.The main work and innovation of the thesis include the following aspects:1.We designed a simulation environment for the communication support task of the UAV lift-off platform based on the Open AI GYM architecture,realized wide compatibility with different geographical types.This simulation environment solved the problem of the path loss of the air-to-ground channel which is difficult to measure,and the problem of lacks an interactive environment of path planning algorithm based on reinforcement learning.Besides that,we used UAV to carry 5G base stations in the field to conduct tests to verify the accuracy of the path loss prediction of the air-to-ground channel model.The experiment results showed that this environment can effectively interact with various reinforcement learning algorithms to generate empirical data for agents to learn,and at the same time can be used as a platform for algorithm performance test.This paper took the emergency communication mission of the UAV lift-off platform as a prototype,modelled the path loss in the air-to-ground channel as a function of the elevation angle and environmental parameters.By summarizing the task as a timing decision optimization through mathematical derivation,the communication support task was constructed as a simulation environment.2.We proposed a Reinforcement learning algorithm research based on intrinsic reward,which realized the use of intrinsic rewards to guide the agent to efficiently explore the environment.In addition,combining with adaptive parameter adjustment,agent improved its own performance steadily in the trust domain.It solved the problem that the reinforcement learning algorithm is easy to converge to the local optimal strategy and the training is not stable in the communication guarantee task of the UAV lift-off platform.Experiment results showed that compared with mainstream algorithms such as DDPG,PPO,SAC,TD3,the algorithm proposed in this paper has made considerable progress in terms of strategy learning speed and agent performance.3.We proposed an intrinsic reward reinforcement learning algorithm based on Impala architecture.It realized the complete decoupling of empirical data collection and strategy update which solved the problems of low data collection efficiency in the agent training phase,slow strategy learning speed,and poor distributed scalability.Experiments showed that compared to other parallel architectures,the throughput of our algorithm for empirical data collection in a unit time slot was significantly improved.Besides that,our algorithm improved the learning speed of the agent strategy while ensuring the learning performance.4.We proposed a model-based reinforcement learning algorithm,which make agent learn the dynamic model of the environment.Through the combination of planning and prediction,our algorithm improved the prediction accuracy of the value function,and solved the reinforcement learning algorithm problem of low sample efficiency in the communication support task.Experiment showed that under the same performance of the algorithm,compared with the model-free algorithm,proposed algorithm improved sample efficiency of agent by an order of magnitude.Compared with other model-based algorithms such as MVE,our algorithm greatly improved performance and algorithm robustness.The experimental results showed that the algorithm proposed in this paper has improved performance,learning speed,and sample efficiency compared with the current mainstream UAV platform path planning methods.The algorithm proposed in this paper provides support for UAV platform applications in areas such as emergency rescue,and auxiliary communication in hotspot areas.

Keywords/Search Tags:

UAV lift-off platform, reinforcement learning, deep reinforcement learning, algorithm of UAV path planning, air-to-ground channel modeling

PDF Full Text Request

Related items

1	Research On Reinforcement Learning Algorithm For Mobile Vehicle Path Planning In A Special Traffic Environment
2	Cooperative Path Planning For Region Surveillance Of Multi-UAV Based On Genetic Algorithm And Deep Reinforcement Learning
3	Research On Cellular-connected Unmanned Aerial Vehicle Path Planning Based On Deep Reinforcement Learning
4	Path Planning For Unmanned Vehicles In Unknown Dynamic Environments Based On Deep Reinforcement Learning Algorithm
5	Research On Path Planning Problem Based On Deep Reinforcement Learning
6	Research On AUV Behavior Replanning Method Based On Reinforcement Learning
7	Multidimensional Path Planning For UAV Based On Deep Reinforcement Learning In Urban Environments
8	Research On Autonomous Valet Parking Path Planning Method Based On Deep Reinforcement Learning
9	Research On Vehicle Path Planning Based On Deep Reinforcement Learning
10	Path Planning Method And Design Of USV Based On Reinforcement Learning