Font Size: a A A

Deep Reinforcement Learning Based Reentry Guidance Method For Aircraft

Posted on:2024-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiFull Text:PDF
GTID:2542307067493484Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
A hypersonic vehicle undergoes multiple stages during its mission,and the reentry phase constitutes approximately 85% of the total flight time.Therefore,the performance of the reentry phase significantly affects the successful execution of the entire flight mission.Reentry guidance involves providing steering instructions for the aircraft during the reentry phase,controlling it to pass through the atmosphere and reach the reentry terminal while maintaining enough energy to successfully land or attack.Reinforcement learning is a method to solve the problem of reentry guidance,which has a shorter instruction solving time compared to predictive correction.However,applying reinforcement learning to solve guidance commands in a variable-period guidance environment with sparse rewards poses the following challenges.Designing effective reward functions is a challenging task.Poorly designed reward functions can make it difficult to train the policy network or result in suboptimal solutions.Directly using simulation data to aid in policy learning can lead to distribution drift issues.In addition,agents trained in a single environment may over-adapt to the physical parameters of the training environment,resulting in a decline in model performance during deployment.Therefore,improving the generalization ability of the model is necessary.To address these challenges,the specific work of this thesis is as follows:· Implementing a reentry guidance simulation environment that extends the Gym standard interface.In this thesis,the reentry process of the aircraft is first modeled,including dynamic equations,atmospheric models,and process constraints.Next,the state,action,and reward elements are abstracted from the reentry guidance problem to construct the reentry guidance simulation environment.Finally,simulation data is generated using the prediction-correction algorithm.· Proposing CVED,a reinforcement learning algorithm based on conditional variational autoencoder.In a reentry guidance environment with sparse reward settings,it is challenging for the agent to explore feasible trajectories and learn the policy.This thesis proposes a solution to this problem by utilizing simulation data to aid policy learning,thereby avoiding the cumbersome process of designing procedural rewards.To address the distribution drift issue associated with directly exploiting simulation data,a conditional variational autoencoder is used to encode the simulation data into a latent space,and then learn a policy on the latent space.To estimate the value of a state and an action more accurately,this thesis introduces a stochastic ensemble value network and diversification loss.Experimental results demonstrate that the CVED algorithm can successfully use simulation data to explore feasible solutions in the reentry guidance environment without the need for process rewards,outperforming other methods that use simulation data.· Proposing DDRAB,a domain randomization algorithm based on knowledge distillation.To enhance the policy’s generalization ability,this algorithm first trains several teacher models in various randomized environments with similar physical parameters.The teacher models’ value and policy networks are then used to guide the student’s policy learning.Additionally,this algorithm dynamically adjusts the boundaries of physical parameters during the student model’s training.Results indicate that the DDRAB algorithm outperforms other domain randomization algorithms in terms of generalization performance.· Designing and implementing a reentry guidance decision-making system that supports model training and visualization.This thesis designs and implements a reentry guidance decision system to address the need for reentry guidance laws to be redesigned for different vehicle types and reentry terminal.The system consists of several functional modules,including data management,model training,model usage,data visualization,and security center.For model training,a middleware is implemented to monitor the training progress and automatically resume training in case of interruptions.In summary,this thesis focuses on the aircraft reentry guidance method that is oriented towards simulation data,and includes the construction of a reentry guidance simulation environment as well as the proposal of two algorithms.The experimental results demonstrate that the proposed approach can effectively learn the policy in a variableperiod reentry guidance environment with sparse reward settings.Finally,this thesis integrates the proposed algorithms to develop a reentry guidance decision-making system.
Keywords/Search Tags:Reentry Guidance, Simulation Data, Reinforcement Learning, Knowledge Distillation
PDF Full Text Request
Related items