| Observation path planning of marine mobile observation platform is an important technology in marine environment observation system,and an effective method to improve the efficiency of regional marine environment observation and the analysis and forecast ability of coupled environmental numerical prediction system.The complexity,time-variability and multi-scale characteristics of the marine environment are the important factors that affect the observation efficiency of the marine mobile observation platform.Traditional optimization methods have some disadvantages,such as high subjective factor in setting cost function and unable to make full use of previous learning experience.Reinforcement learning is a kind of sequential decision algorithm.Based on the characteristics of strong optimization ability of reinforcement learning,this paper applies reinforcement learning to the adaptive observation path planning of mobile observation platform to improve the observation efficiency of mobile observation platform,so as to further improve the analysis and prediction ability of marine environmental factors.The main research contents of this paper are as follows:Firstly,observation path planning of static environmental field of mobile observation platform is the basis of adaptive observation path planning based on a group of initial forecast fields.In this paper,the problem is abstracted as Markov Decision Process,and the observation path of mobile observation platform in static environment is planned by reinforcement learning algorithm based on discrete action and continuous action.The simulation results show that the algorithm based on discrete action is suitable for observation path planning of mobile observation platform in static environment.By assimilating the sampling results,the effect of the observation path planning method on improving the analytical forecast ability of coupled environmental numerical forecast system is verified.Secondly,due to the time variability of the marine environment,the static environmental field cannot accurately describe the change of the marine environment.In this paper,the background field data is updated in real time during the observation path planning of the mobile observation platform to achieve adaptive observation of the mobile observation platform.In the process of adaptive observation of mobile observation platform,aiming at the problems of low efficiency of reinforcement learning samples and low degree of environmental exploration,an adaptive observation path planning method based on evolutionary reinforcement learning was designed by combining the advantages of evolutionary reinforcement learning algorithm and reinforcement learning algorithm.The simulation results show that compared with the traditional reinforcement learning algorithm,the evolutionary reinforcement learning algorithm can effectively improve the adaptive observation efficiency of the marine mobile observation platform,and improve the analysis and prediction ability of coupled environmental numerical prediction system.Finally,for the networking observation of multiple mobile observation platforms,an adaptive observation path planning method based on multi-agent reinforcement learning is designed in this paper.Considering the influence of value function decomposition in cooperative multi-agent reinforcement learning algorithm on algorithm performance,three multi-agent reinforcement learning algorithms with different value function decomposition modes are adopted in this paper,and an adaptive path planning method for multi-mobile observation platform is designed.Experiments show that this method can improve the observation efficiency of multi-mobile observation platform and improve the analysis and prediction ability of coupled environmental numerical prediction system effectively. |