| With the continuous development of the social urbanization process,people’s living standards are also constantly improving.It is very important to keep the indoor temperature stable and comfortable in winter.The central heating system,as an important infrastructure of modern urban public utilities,can provide stable and reliable high quality heat source for the city in the cold winter and raise people’s standard of living.It is environment-friendly,safe,cheap and convenient,with significant social and economic benefits.The heat exchange station is the intermediate node connecting the hot water of the heat source plant and the radiator of the heat user and other indoor heating facilities.It can adjust and control the initial water temperature of the heat source plant and turn it into an appropriate temperature to the heat user.Therefore,it is important to control the heating water temperature under different weather conditions and how much the temperature rises and falls each time.Moreover,the control strategy should also be suitable for communities with different building structures.However,the traditional control strategy of climate compensator has some shortcomings,such as simple model,extensive adjustment,fixed climate compensation curve,unable to obtain timely indoor temperature feedback and unable to control in advance through prediction,etc.,which makes it difficult to ensure the stability and comfort of indoor temperature under different climate changes.Therefore,this thesis investigates optimization methods of heating control strategy based on deep learning and reinforcement learning.The main work includes the following aspects:1.Real-time temperature sensors are installed in hundreds of owners’ homes in multiple communities of Tianjin,and real indoor temperature data of one-third of the houses in each residential area is collected during the whole heating season on average.Then the meteorological data and outdoor temperature data of the same heating season monitored by the outdoor meteorological monitoring station in the community are collected.And the heating temperature data of the secondary pipe network in the same heating season is obtained from the heat exchange station.These data are integrated,cleaned,etc.The data influence relationship is analyzed based on thermodynamic laws,and the basic data sets required for modeling are generated.2.A deep multi-time differential network MTDN is proposed,which can fully mine the hidden information in the data set when the amount of original data is small.At the same time,the model loss can be specially designed according to the actual needs,so that it has high prediction accuracy and generalization ability.The network takes the truth data set as the input and the multi-time difference data set as the label.Based on the first law of thermodynamics,the network aims to learn the thermodynamic characteristics of indoor temperature,so that the model can maintain the prediction accuracy while the results conform to the physical laws.And through experiments,it is verified that the network has stable prediction effect and good simulation effect for the thermodynamic characteristics of the house.Therefore,it can act as the simulator of the real house environment and the environment for reinforcement learning model to interact with it.3.For the actual scene of central heating system,SAC algorithm based on maximum entropy reinforcement learning thought is packaged as a strategy optimizer and used in heating control strategy optimization,because the introduction of maximum entropy makes it have strong exploration ability and can learn stable heating strategy.For the real-time comfort of indoor temperature,PMV,a special evaluation index representing the thermal comfort of human body,is introduced as the reward item of the simulator.For the long-term stability of indoor temperature,the variance of indoor temperature at multiple moments is taken as another reward item of the simulator.And the weighted sum of these two rewards is integrated into the overall reward of the simulator,so that the strategy learned by the strategy optimizer can better meet the actual scene.Finally,relevant experiments are designed to prove that the strategy learned by the strategy optimizer can ensure the long-term stability and comfort of indoor temperature. |