Self-learning Method Of UAV Trajectory Planning Strategy In Multi-constrained Complex Environment

Posted on:2021-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qiu

Full Text:PDF

GTID:2492306104494404

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Although the traditional UAV trajectory planning search algorithm has a strong path search capability,it cannot obtain prior knowledge from historical experience.Reinforcement learning has the ability to gain experience through an iterative learning process of trial and evaluation,and then obtain a state-action mapping strategy that maximizes revenue.Therefore,the strategy based on reinforcement learning can use the learned strategy as a priori knowledge in unknown environments or new tasks,so as to improve the efficiency of trajectory planning.Deep reinforcement learning utilizes the strong perception and characterization capabilities of deep neural networks to the environment to obtain optimization strategies in reinforcement learning,enabling the trajectory planning strategy learning model to have generalization capabilities for dynamic tasks or complex and changing environments.This thesis proposes a strategy selflearning method based on deep reinforcement learning for trajectory planning in a multiconstrained complex environment.Combining the characteristics of input information such as planning tasks,constraints,flight environment,and optimization objectives,key models such as state,action,reward function,and strategy-value deep network of deep reinforcement learning system are designed.In terms of state and action space design,the layered coding representation of the planning task,the global environment and the local environment of the aircraft realizes the graphical representation of the aircraft’s turning state and matching state;Using the complex constraints between the two matching navigation points to construct the feasible interval of the turning point and the feasible region of the next matching navigation point reduces the expression space of the action,which not only makes the trajectory obtained through exploration and decisionmaking meet the complex constraints conditions,and can effectively reduce the difficulty of decision-making and speed up the trajectory planning.In terms of reward function design,the reward function in the reinforcement learning of the optimization target design in the existing traditional trajectory planning system is combined with the use of reward shaping technology to introduce heuristic information into the reward function to improve the learning efficiency of the system.In terms of strategy learning and expression in the deep reinforcement learning process,combined with deep convolutional neural network and Actor-Critic method,the turning point planning strategy network and matching point planning strategy network are designed.The planning strategy network performs iterative learning in two steps: 1)The Monte Carlo tree search method is used to guide the unmanned aerial vehicle to explore the environment based on the planning strategy network and generate sample data.2)The planning strategy network learns the sample data and updates the strategy.Monte Carlo tree search has powerful strategy improvement capabilities,can generate better quality trajectory samples,and is beneficial to improve the learning efficiency of planning strategy networks.The experimental results show that the reinforcement learning system designed based on this thesis has self-learning ability and can accomplish the trajectory planning task well.The planning strategies learned have generalization capabilities in unknown environments or new tasks.

Keywords/Search Tags:

Trajectory Planning, Strategy Learning, Deep Reinforcement Learning, Monte Carlo Tree Search, Multiple Constraints, Complex Environment

PDF Full Text Request

Related items

1	Research On UAV Trajectory Planning Method Based On Actor-Critic Network Architecture
2	Research On Longitudinal Trajectory Planning Algorithm For Connected And Automated Vehicle In Mixed Traffic Flow Based On Reinforcement Learning Theory
3	Generative Design Of Trusses Based On Reinforcement Learning
4	Research On Complex Constrained Trajectory Planning For Autonomous Vehicles With Multiple Scenarios
5	Research On UAV Trajectory Planning Based On Deep Reinforcement Learning
6	Research On Real-time Trajectory Planning Of Parafoil Based On Deep Reinforcement Learning Algorithm Under Complex Constraints
7	Research On Trajectory Planning Method Of 6-DOF Parallel Platform Based On Deep Reinforcement Learning
8	Research On Three-dimensional Trajectory Design And Resource Scheduling Optimization Algorithm For Complex UAV Network
9	Research On Hopping Trajectory Planning For Asteroid Probe Via Deep Reinforcement Learning
10	Research On Reinforcement Learning Algorithm For Mobile Vehicle Path Planning In A Special Traffic Environment