| To solve the environmental pollution effectively,the power system must be large-scale access of new energy to promote the clean and low-carbon transformation of power grid.The new energy sources and loads such as wind power,photovoltaic and electric vehicles are characterized by randomness,intermittence and difficulty in accurate prediction.Large-scale access of these new energy will lead to the frequency fluctuation of power grid,which brings great challenges to the security,stability and economic operation of power system.Therefore,this paper explores a kind of reinforcement learning algorithms for distributed energy based on multiple-agent stochastic consensus game theory and multiple-level power allocation strategy to obtain the optimal collaborative control of multiple regional power grid,from the perspective of automatic generation control(AGC),which can solve the strong stochastic disturbance caused by large-scale interconnected grid connection of new energy,and promote the compatibility between power system and new energy.Firstly,the research status of AGC is summarized and reviewed,and the multiple-agent control framework and its theoretical basis are described in detail.Meanwhile,the two kinds of AGC model are also introduced and analyzed.Secondly,a novel MSGP-CQ algorithm is proposed to achieve the optimal control of power system.The convergence speed and learning efficiency in the MSGP algorithm are accelerated through the predictive multiple-step iteration updating,and the CQ algorithm is adopted with collaborative consensus and self-learning characteristics to enhance the adaptability under the strong stochastic disturbances,so as to obtain the total power commands in power grid and the dynamic optimal allocations of the unit power.The simulations of the improved IEEE two-area load frequency control(LFC)model and the intelligent distribution network model show that the proposed algorithm can achieve the optimal power allocation in power grid.The MSGP-CQ algorithm has stronger robustness and faster dynamic optimization speed and can reduce generation costs.Finally,an efficient GQ(σ,λ)algorithm is proposed to achieve the optimal control for the distributed multiple-area power system.Linear function approximation and mix-sampling parameter are used to unify full-sampling algorithm and pure-expectation algorithm,which can reduce the storage space of state-action pairs required by the control algorithm,in order to solve the strong stochastic disturbance caused by large-scale new energy access to power grid.And the IEEE two-area LFC model and the integrated energy system model incorporating with distributed energy are used for simulation verification.The results show that the GQ(σ,λ)algorithm has faster dynamic optimization speed,better cooperative control performance and less carbon emissions. |