Deep Reinforcement Learning Based Reentry Guidance Method For Aircraft

Posted on:2024-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Li

Full Text:PDF

GTID:2542307067493484

Subject:Software Engineering

Abstract/Summary:

PDF Full Text Request

A hypersonic vehicle undergoes multiple stages during its mission,and the reentry phase constitutes approximately 85% of the total flight time.Therefore,the performance of the reentry phase significantly affects the successful execution of the entire flight mission.Reentry guidance involves providing steering instructions for the aircraft during the reentry phase,controlling it to pass through the atmosphere and reach the reentry terminal while maintaining enough energy to successfully land or attack.Reinforcement learning is a method to solve the problem of reentry guidance,which has a shorter instruction solving time compared to predictive correction.However,applying reinforcement learning to solve guidance commands in a variable-period guidance environment with sparse rewards poses the following challenges.Designing effective reward functions is a challenging task.Poorly designed reward functions can make it difficult to train the policy network or result in suboptimal solutions.Directly using simulation data to aid in policy learning can lead to distribution drift issues.In addition,agents trained in a single environment may over-adapt to the physical parameters of the training environment,resulting in a decline in model performance during deployment.Therefore,improving the generalization ability of the model is necessary.To address these challenges,the specific work of this thesis is as follows:· Implementing a reentry guidance simulation environment that extends the Gym standard interface.In this thesis,the reentry process of the aircraft is first modeled,including dynamic equations,atmospheric models,and process constraints.Next,the state,action,and reward elements are abstracted from the reentry guidance problem to construct the reentry guidance simulation environment.Finally,simulation data is generated using the prediction-correction algorithm.· Proposing CVED,a reinforcement learning algorithm based on conditional variational autoencoder.In a reentry guidance environment with sparse reward settings,it is challenging for the agent to explore feasible trajectories and learn the policy.This thesis proposes a solution to this problem by utilizing simulation data to aid policy learning,thereby avoiding the cumbersome process of designing procedural rewards.To address the distribution drift issue associated with directly exploiting simulation data,a conditional variational autoencoder is used to encode the simulation data into a latent space,and then learn a policy on the latent space.To estimate the value of a state and an action more accurately,this thesis introduces a stochastic ensemble value network and diversification loss.Experimental results demonstrate that the CVED algorithm can successfully use simulation data to explore feasible solutions in the reentry guidance environment without the need for process rewards,outperforming other methods that use simulation data.· Proposing DDRAB,a domain randomization algorithm based on knowledge distillation.To enhance the policy’s generalization ability,this algorithm first trains several teacher models in various randomized environments with similar physical parameters.The teacher models’ value and policy networks are then used to guide the student’s policy learning.Additionally,this algorithm dynamically adjusts the boundaries of physical parameters during the student model’s training.Results indicate that the DDRAB algorithm outperforms other domain randomization algorithms in terms of generalization performance.· Designing and implementing a reentry guidance decision-making system that supports model training and visualization.This thesis designs and implements a reentry guidance decision system to address the need for reentry guidance laws to be redesigned for different vehicle types and reentry terminal.The system consists of several functional modules,including data management,model training,model usage,data visualization,and security center.For model training,a middleware is implemented to monitor the training progress and automatically resume training in case of interruptions.In summary,this thesis focuses on the aircraft reentry guidance method that is oriented towards simulation data,and includes the construction of a reentry guidance simulation environment as well as the proposal of two algorithms.The experimental results demonstrate that the proposed approach can effectively learn the policy in a variableperiod reentry guidance environment with sparse reward settings.Finally,this thesis integrates the proposed algorithms to develop a reentry guidance decision-making system.

Keywords/Search Tags:

Reentry Guidance, Simulation Data, Reinforcement Learning, Knowledge Distillation

PDF Full Text Request

Related items

1	Reentry Trajectory Optimization And Guidance Method Based On Convex Optimization And Reinforcement Learning
2	Research On Reentry Dynamics And Guidance Of Manned Deep Space Exploration
3	Research On Reinforcement Learning Aided By Expert Knowledge And Its Auplication In Path Planning Of UAV
4	Research On Reentry Guidance Method For Reusable Launch Vehicle
5	Theory Of Predictive Guidance With Application To Reentry And Exoatmospheric Interception
6	Research On Traffic Flow Prediction Algorithm Based On Knowledge Distillation
7	Remote Sensing Image Classification Network Based On Knowledge Distillation And Multi-instances Learning
8	Research On Obstacles Collision Avoidance Based On Enhanced Federated Learning In Intelligent Connected Vehicles
9	Bolt Defect Image Classification Of Transmission Line Based On Knowledge Distillation
10	Research On Reentry Trajectory Optimization And Guidance Method For Lifting Vehicle