| With the continuous development of intelligent technology and the increasing diversification of power consumption mode,demand elasticity and dispatchable ability are constantly displayed.In this context,the structure of the power distribution system has changed,and the active distribution system has also been gradually developed.How to deal with the uncertainties brought to the power grid by the diversification of the active distribution system,and how to tap the potential benefits which brought by the dispatching of flexible load in the flexible environment have become an important topic in the research of electric system dispatching automation technology.Therefore,this study mainly did the following work:Firstly,we studied the dispatching management mode of active distribution system and the output characteristics of each submodule including flexible load,energy storage device and photovoltaic;The active distribution system model based on the output constraints of each module and the power balance model in the process of peak regulation task allocation are established.Secondly,load elasticity is described from the perspectives of elastic margin,incentive level setting and regional elastic range,and its role in the process of dispatching optimization is analyzed.Then,the discrete Markov decision process(DTMDP)model of the dispatch optimization of active distribution system considering the peak regulation demand and load elasticity was established,and the learning optimization algorithm was given.The single-region scheduling optimization uses centralized control Q-learning,and the multi-region scheduling optimization uses multi-agent hierarchical Q-learning.The learning optimization model of the problem was established based on the following aspects: state and state space,action and action sets,transition process of state,reward function and optimization goal.Specific steps of the multi-agent hierarchical Q-learning for solving the problem are given.Finally,the specific methods for the problem were designed and the simulation was conducted.The simulation results of the calculation example show that the strategy obtained by the adopted learning optimization method can ensure the stable and economic operation of the system on the basis of basically satisfying the demand of the peak regulation,which verifies the effectiveness of the optimization method.At the same time,the dispatch information can be further obtained by considering the power elasticity,which promotes the full participation of flexible power resources in peak regulation. |