| With the continuous promotion of industrialization and smart buildings in China,in the two sessions just concluded in 2021 suggested that,in addition to the north of China,pilot heating might also be implemented gradually in the future in the needy areas in the south of China.Although the popularity of central heating in China is high,the problem of heat distribution among multiple heat stations still needs to be solved due to the late start.All along,experts and scholars have used different methods for control,such as traditional control methods and classical PID control,but it is difficult to achieve a better control effect due to the considerable lag,coupling and timevarying nature of the system itself.Based on this,this paper proposes a reinforcement learning based heat optimization regulator,at the same time,by simulation experiments to demonstrate the control effect of the system.For the phenomenon that some current methods of heat load prediction lack effective ac-tual data,a heating company in Baotou City is selected to conduct prediction and simulation experiments with many years of historical data.By analyzing the main factors affecting heat load prediction as outdoor temperature and social activities of heat users established on time series,establishing a black-box model to reasonably model heat exchange stations.Using Long Short-Term Memory Artificial Neural Network(LSTM)in Tensorflow deep learning framework for primary side heat load prediction and heat station modeling of district heating system,the outstanding advantage of the model is that it can effectively solve the multivari-ate and data volume longterm dependence problems based on time series.The results show that the proposed LSTM model has good performance in both heat load prediction and ther-mal station modeling.Considering the predicted heat load under the influence of wind speed factor or not,the experiments demonstrate that the wind speed affects the heat load values,the higher the wind speed,the higher the heat load increases,and the average relative error is smaller than that without wind factor.Richer data are selected for the modeling of heat stations,and a black-box model is built as a reinforcement learning environment model.The results of the heat load prediction are used as the target values for reinforcement learning in the optimization control process,and the objective function of the primary side is established according to the actual operation of the centralized heating system.Through the built heat station environment model,the optimized control strategy is continuously cycling and iter-ating,and the DDPG algorithm in reinforcement learning is used to build the neural network to obtain the optimized flow control sequence for the primary side of each heat station.The heat supply calculated by the above control sequence is compared with the predicted heat load value,and the optimization effect is observed for further improvement.This paper investigates the optimal control of heat in multiple heat stations based on reinforcement learning algorithms,and introduces artificial intelligence methods into the field,which can serve as a reference for other complex continuous control problems. |