Font Size: a A A

Unmanned Surface Vehicle Motion Control Based On Reinforcement Learning

Posted on:2024-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2542307154999359Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,Unmanned Surface vehicles(USVs)have been widely used in both military and commercial applications.In some dangerous sea areas,USVs can replace humans to carry out tasks in extreme and hazardous environments.USVs have the advantages of high intelligence,strong environmental adaptability,flexible maneuverability,and high work efficiency.However,USVs belong to typical underactuated systems,which exhibit strong disturbances,model uncertainties,and strong couplings,making the motion control of USVs difficult and complex.Traditional algorithms for attitude and position control of USVs in complex situations rely heavily on the accurate model of the controlled object,which is difficult to obtain in practice.Reinforcement learning realizes continuous optimization of its training policy through the interaction between the agent and the environment.It can resist unmodeled system uncertainties and unknown external disturbances,continuously optimize the set comprehensive performance indicators,and provide intelligent,autonomous,and highperformance control for USVs that can adapt to unfamiliar environments.This project takes USVs as the control object and focuses on the problem of maintaining the USVs’ speed and heading in a multitasking scenario.The proximal policy optimization(PPO)algorithm is improved to enhance the control accuracy and convergence speed of the algorithm.The main work is as follows:(1)Due to the continuous action space and multiple state spaces of USVs,their operating environment is complex,which may cause the algorithm to encounter zero-gradient problems during training and get trapped in local optima,resulting in the failure to achieve the control objectives for USVs’ speed and heading.This thesis proposes an improved reinforcement learning training policy combining Jensen-Shannon divergence(JS divergence)and clipped objective function to address the clipping bias caused by high-dimensional actions of USVs.To address the issues of low stability and slow convergence of reinforcement learning in continuous spaces,a diversified reward function is used,which includes a multi-dimensional reward combining target reward and boundary protection reward instead of sparse reward function.This approach improves the exploration efficiency of USVs during initial training and ensures that they can stay stable in the target state after policy formation,thereby achieving more precise and rapid control of USVs’ speed and heading.(2)We propose a reinforcement learning control policy based on the Smith predictor for USV systems with input delay.First,the Smith predictor is introduced in the observation stage to optimize the policy in conjunction with reinforcement learning.In addition,a second experience pool is added to the controller to store delayed states,ensuring that the action and state spaces of the USV match in the delay state.Secondly,a reward function is designed that combines driving and delayed rewards,improving the exploration of the USV in unknown environments.Finally,simulation results show that the improved PPO control strategy can solve problems such as imprecise control and large fluctuations caused by system delay,achieving speed and heading control of USVs in a delay state.
Keywords/Search Tags:Reinforcement learning, Proximal policy optimization algorithms, USVs, Reward functions, Smith predictor
PDF Full Text Request
Related items