| Driven by the requirement of energy conservation and emission reduction,the penetration rate of renewable energy power sources,represented by photovoltaic power generation,in the distribution network is increasing,and the pressure of dealing with uncertainty in the operation of the power system is gradually increasing.It is difficult to improve the renewable energy consumption rate by relying on the superior network to provide power margin,and the distribution network needs to have the ability of self-governance.In this regard,the concept of active distribution network has emerged.Active distribution network can use advanced measurement,communication and power electronic technologies to actively manage and coordinate the control of distributed generation,energy storage and other controllable resources.In the above context,the key to solve the uncertainty of renewable energy generation is to fully exploit the flexible regulation potential of adjustable resources during the operation of the active distribution network.It has important theoretical and practical significance to enhance the renewable energy consumption capacity of active distribution network,reduce the pressure of u superior network regulation and control,and improve the reliability and economy of active distribution network operation.While the gradually improved measurement and communication technologies in the distribution network create the prerequisites for further exploiting the flexible regulation potential of the existing controllable resources within the active distribution network,they also put forward higher requirements for active distribution network optimal operation methods.However,the current research on active distribution network optimal operation is mainly based on a series of derivative methods under the concept of stochastic optimization.For example,the scenario optimization method and the robust optimization method are offline decision making with the help of day-ahead prediction information,which cannot take full advantage of the gradually completed measurement and communication technology conditions of the active distribution network.It is difficult to avoid the tendency to be conservative with a fixed decision scheme for all possible predicted operation scenarios.Deep reinforcement learning as a representative of the new generation of artificial intelligence has made great progress in eliminating the need for a priori knowledge,reducing resource loss,and increasing training speed.On the one hand,model-free solving by deep reinforcement learning can avoid mechanistic modeling of complex power electrification devices and flow models.A shift from model-driven to data-driven is realized for active distribution network optimal operation.On the other hand,assuming complete communication and measurement,deep reinforcement learning models the process of determining uncertainty information over time and provides multiple sets of alternative decisions that satisfy intertemporal coupling constraints for different realizations of uncertainty.This allows active distribution network optimal operation to dynamically adjust decisions to uncertainty realizations with online measurement information.The decision results are more accurate and targeted,and the flexible regulation potential of various regulable resources can be fully utilized.In this regard,this paper presents a theoretical study on the optimal operation and control of active distribution networks including distributed renewable energy generation,energy storage and other controllable resources,assuming complete measurement and communication,and using deep reinforcement learning as the implementation tool to fully exploit the flexible regulation potential of existing controllable resources.Firstly,the deep reinforcement learning algorithm is explained by combining the relevant theories and algorithms in three fields,such as stochastic programming,adaptive dynamic programming and deep reinforcement learning.Based on this,the models and computational methods of active distribution network day-ahead optimal operation and real-time optimal operation based on deep reinforcement learning are proposed.The former includes scheduling and power margin scheduling to cope with uncertainty,and pursues the economical operation of the active distribution network while consuming renewable energy generation locally as much as possible to reduce the pressure of superior network regulation.The latter responds to power fluctuations in real time through power margin release on the basis of the operating base point given by the day-ahead scheduling,which enhances the robustness of system operation.This paper provides a solution and theoretical support for the data-driven active distribution network optimal operation.It is important to make full use of the available regulated resources to consume renewable energy generation,improve the reliability and economy of active distribution network operation,and realize the transformation of active distribution network optimal operation from model-driven to data-driven.The main works of this thesis are as follows:1)Deep reinforcement learning algorithms are described in conjunction with knowledge in three fields:stochastic programming,adaptive dynamic programming,and deep reinforcement learning.The following issues are outlined:Two-stage stochastic programming and multi-stage stochastic programming are compared,and a simplified multi-stage stochastic programming formulation applicable to the active distribution network optimal operation problem is given.The principle of inter-stage decoupling of multi-stage stochastic programming is analyzed,and the recursive equations of the value function are divided into two categories,Bellman’s optimal equation and Bellman’s expectation equation,to serve as the basis for subsequent algorithms.The deep reinforcement learning algorithms are divided into two categories,value iteration and policy iteration,and their principles of solving the dimensional disaster and model-free implementation are described and their properties are analyzed,respectively.This will be used as the basis for specific modeling and solving work in the subsequent paper.2)A multi-stage stochastic programming model and an improved value iteration algorithm are proposed for the active distribution network day-ahead active-reactive power coordination optimization problem.The proposed model models a scenario tree in which the actual renewable energy output and the actual load size are determined gradually over time,and provides multiple alternative strategies for energy storage devices and grouped drop-in capacitor banks to satisfy the storage power constraint and the grouped drop-in capacitor bank daily regulation constraint.This enables the day-ahead optimal operation to be dynamically adjusted to the power fluctuations of renewable energy and load with the help of online measurements,which can better exploit the flexible regulation potential of the energy storage devices and the grouped drop-in capacitor banks to cope with the uncertainty of renewable energy and load.To decouple the system model and the uncertainty realization process for the characteristics of the active distribution network day-ahead optimization problem.The’trial-and-error’ process is realized by means of a data-driven active distribution network model,which fundamentally solves the problem of high trial-and-error cost of deep reinforcement learning.An improved value iteration algorithm is proposed to improve the stability of the algorithm by introducing some mature engineering techniques,and to handle both continuous and discrete decision variables.3)A two-scale multi-stage stochastic programming model for the active distribution network day-ahead active-reactive power coordination optimization problem and an improved policy iteration algorithm are proposed.The proposed model takes into account the cost of upward and downward power margins provided by the superior network in order to reduce the pressure on the superior network to reserve power regulation margins,and defines the system operational risk using conditional value-at-risk.Compared with the scenario optimization method,the power margin of the proposed model is not given by the day-ahead,but is achieved by simulating the intra-day operation of the storage device through a multi-stage stochastic programming model.And thus achieves the transfer of the power margin of the energy storage device between time periods,which effectively improves the equipment utilization rate of the energy storage device and further improves the reliability and economy of the power margin optimization;To deal with the conditional value-at-risk constraint,an improved PPO algorithm is proposed.Its two-scale multi-stage stochastic programming model is reconstructed as a constrained Markov decision process on the basis of the PPO algorithm,and a Lagrangian function is constructed to relax the conditional risk-value constraint.4)A real-time active-reactive coordination optimization model for active distribution networks is proposed,and an improved multi-agent deep reinforcement Learning algorithm is proposed to realize the distributed solution of the model.The proposed real-time optimization model of active distribution network achieves a good interface with the aforementioned day-ahead operation optimization by pursuing the minimum deviation of energy storage power at the end of the optimization cycle from the day-ahead scheduling.It is able to track the optimal operation of energy storage given by the day-ahead scheduling,while taking full advantage of the rapid regulation of energy storage in response to real-time power fluctuationsThe multi-agent deep reinforcement Learning algorithm is used to achieve a distributed solution to the realtime optimization problem of active distribution networks in a model-free manner.Its centralized training and decentralized execution architecture allow the controller to coordinate global control effects in a decentralized manner and is immune to any interregional communication interference.The proposed algorithm guarantees the insensitivity of the algorithm to hyperparameters without the strong assumption of value decomposition and without the assumption of shared parameters,and is able to handle system operational constraints that are difficult to handle by traditional reinforcement learning algorithms. |