| With the continuous progress of industrialization and the consumption of large-scale fossil fuels,energy crisis has become a huge challenge facing countries around the world.In order to address this challenge,countries are actively promoting energy structure optimization and continuously advancing energy green and low-carbon transformation.In recent years,renewable energy has gradually penetrated into the distribution network due to its characteristics of cleanliness,low carbon,and no pollution,becoming an important way to achieve energy conservation and emission reduction and to achieve carbon peaking and carbon neutrality goals.However,distributed energy output has issues such as uncertainty and intermittency,which can significantly affect the stability of new power systems.Energy management technology based on reinforcement learning is widely used due to its ability to improve the reliability,safety,and economy of the power system,traditional reinforcement learning methods face issues of slow learning speed and sparse rewards when dealing with the energy management of complex power systems.Therefore,this thesis has improved traditional reinforcement learning methods to improve the effectiveness of energy management strategies.The main research contents of this thesis are as follows:The energy optimization management strategy of the deep deterministic policy gradient algorithm was studied.First,the electric energy router system model was established with energy balance and minimum electricity cost as optimization objectives.The optimization problem was transformed into a Markov decision process,and the state space,action space,and reward function were defined.Secondly,sparse rewards can cause most samples in the experience pool to not receive rewards,resulting in low sample utilization efficiency.To solve this problem,this thesis proposes a deep deterministic policy gradient algorithm based on priority experience replay,which improves the experience replay mechanism to solve the issue of sparse rewards and improve sample utilization efficiency.Finally,the simulation comparison analysis with other algorithms verified that the algorithm has better stability and effectiveness in solving sparse rewards problems.The above research mainly solves the problem of sparse rewards by improving the experience replay mechanism.However,when dealing with high-dimensional spaces and complex environments,reinforcement learning methods based on improved experience replay mechanisms still face the problem of difficult reward function design,and unreasonable reward function design can lead to slow learning and poor robustness of the algorithm.To address this issue,this thesis proposes a generative adversarial imitation learning method,which combines imitation learning with generative adversarial networks to avoid the design of complex reward functions and solve the issue of sparse rewards.In order to improve the exploration ability and learning efficiency of the agent,a generative adversarial network structure based on expert policies is adopted,and the optimization process of imitation learning is realized through continuous confrontation and game between the discriminant network and the generative network.Finally,simulation analysis and comparison prove that the generative adversarial imitation learning method has faster learning speed and stronger robustness,and verify the effectiveness of the method for energy management. |