Font Size: a A A

Design On Termianl Guidance Law Based On Reinforcement Learning

Posted on:2020-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:G L HanFull Text:PDF
GTID:2392330590973214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Guidance design is a technology related to computer technology and control engineering.It is also a very active branch of guidance field which attracts much attention from academia and industry.In recent years,guidance rate design has been applied in many fields of artificial intelligence guidance.How to accurately guide missiles in real situations has become an important research topic in today's missile guidance rate design.However,today's guidance usually has the following problems: training in simulation environment is needed,and the design of simulation environment needs strong professional background knowledge.In the case of multiple maneuvers,the effect is poor.When the flight state of the missile is restricted,the interception effect is poor.In practical applications,there are differences between simulators and real environments.In pursuit of timeliness and accuracy,it is often required that the algorithms trained under simulators have strong robustness.In order to solve these problems,a new guidance law design method is proposed in this paper.This method does not have the shortcomings of optimal control method.In the case of given missile model and environmental dynamics,the method uses reinforcement learning(RL)to learn the given optimal guidance law.The system model does not need to be a mathematical model,but can be a statistical model learned from flight test telemetry and wind tunnel data using machine learning technology.For example,weighted linear regression,neural network,Gauss mixture model and probability graphics model.For complex dynamic systems,these stochastic models are more effective than simplified mathematical models.Different from the existing guidance rate design algorithm based on control engineering,this paper uses reinforcement learning method to accurately capture the dynamic changes of the environment,which greatly improves the robustness of the algorithm.At the same time,the three problems mentioned above are solved,and a real-time terminal guidance model is realized.Under the condition of multi-maneuvering target,the hitting effect is better and it is end-to-end learning.
Keywords/Search Tags:terminal guidance, reinforcement learning, policy gradient, random search
PDF Full Text Request
Related items