Font Size: a A A

Thermal Power Plant Operation Optimization Based On Deep Reinforcement Learning

Posted on:2022-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ShaoFull Text:PDF
GTID:1522306833466024Subject:Power Machinery and Engineering
Abstract/Summary:PDF Full Text Request
Energy saving and emission reduction is one of the most important means for China to achieve the goal of "carbon peak and carbon neutrality" in the energy field,which is also an important direction for Chinese power companies to improve their core competitiveness.Operation optimization is an effective method to realize energy saving and emission reduction of power units.The related research shows important theoretical significance and engineering application value for safety and economy of the entire energy system.This paper focuses on the research about the operation optimization theory and methods in thermal power plant based on deep reinforcement learning.The main research content and conclusions are as follows(1)The mathematical models of various operation optimization problems of thermal power plants are established.The theoretical framework of reinforcement learning for dynamic operation optimization problems is constructed.The designs of existing algorithms are summarized.The applicability of deep reinforcement learning methods in policy optimization problems of discrete-time dynamic operation optimization is analyzed.The operation optimization algorithm is classified from different perspectives including the type of decision variables and application scenarios.(2)Aiming at the discrete dynamic optimization problems based on characteristic models,a scheduling graph model is proposed.By establishing a cumulative optimization objective function that includes mode switching penalty,the frequent switching of operating modes under continuous variable conditions is effectively avoided.Pruning technology and sub-graph reuse technology are proposed to improve the construction efficiency of scheduling graph.A predictive planning algorithm based on Dijkstra is proposed to solve the scheduling graph model.For complex discrete optimization problems,the conditional prediction SARSA(λ)algorithm based on on-policy discrete action reinforcement learning is proposed.The algorithm reduces the uncertainty of the decision-making process by constructing a generalized state vector containing the prediction sequence,and improves the utilization of interactive samples with multi-step weighted time difference targets.Taking the operation optimization problem of the cold end system of a thermal power unit as an example,the effectiveness of the proposed algorithm is verified.The further analysis of the optimality of the conditional prediction SARSA(λ)algorithm shows that the prediction planning algorithm based on Dijkstra can obtain the theoretical optimal solution of dynamic optimization problems,and the conditional prediction SARSA(λ)algorithm can approximate the theoretical optimal solution of complex optimization problems effectively.(3)Aiming at the discrete dynamic optimization problem based on data,the main reason for the error accumulation of the discrete dynamic optimization algorithm based on the approximate model is revealed through the sensitivity analysis of the dynamic optimization problem to modeling error.A predictive scheduling framework based on dueling deep batch Q network algorithm and differential entropy homogenization algorithm is proposed.The target Q network and auxiliary Q network are used to avoid the overestimation of the action value.The differential entropy is used as the monitoring indicator to realize the selective updating of the data buffer and solves the problem of training instability caused by data imbalance.Taking the operation optimization of the wet flue gas desulfurization system as an example,the effectiveness of the predictive scheduling framework proposed in this paper is verified.The results show that the framework can meet the optimization goals of emission and energy consumption simultaneously,avoid the frequent start and stop of the slurry circulating pump effectively,and achieve good optimization results when the system characteristics are timevarying.(4)Aiming at the continuous dynamic optimization problem based on the characteristic model,the consistency between the policy optimization problem of conditional combination single-objective continuous dynamic operation optimization and the multi-objective continuous dynamic operation optimization problem is demonstrated.A multi-objective proximal policy optimization algorithm is proposed.Given samples of random weight coefficients,the algorithm uses state trajectories to estimate the conditional state value,advantage value,and probability ratio,and iteratively update the policy network according to the conditional policy gradient.Taking the dynamic economic-emission dispatch problem as an example,the effectiveness of the algorithm is verified.The results show that the multi-objective proximal policy optimization algorithm has good calculation accuracy and generalization on both the 5-unit and 10-unit benchmark task sets.It can also significantly improve the efficiency of online dispatching,and obtain evenly distributed Pareto optimization solutions.(5)Aiming at the continuous dynamic optimization problem based on data,a performance optimal control framework based on reinforcement learning is proposed.The diversity of learning samples is increased by superimposing control noise in the operating data,and the homogenization grid algorithm is proposed to solve the data imbalance problem.The continuous batch Q learning algorithm based on particle swarm optimization is proposed,which uses particle swarm optimization to realize the maximum operation accurately and quickly in the iterative calculation of action values,which improve the efficiency of continuous action reinforcement learning.Taking linear water level tracking control,nonlinear water level tracking control and high-pressure feedwater heater performance optimal control as examples,the effectiveness of the framework proposed in this paper is verified.The results show that the control policy function obtained using the proposed framework not only improve the control quality in the dynamic process,but also keep the system in states with high performance in the steady state.
Keywords/Search Tags:Thermal power plant, operation optimization, deep reinforcement learning, dynamic system, artificial intelligence
PDF Full Text Request
Related items