| With the rapid and high-quality development of China’s economy,problems such as the shortage of building energy and environmental pollution have emerged one after another,making building energy conservation an inevitable trend.Among them,building energy consumption is characterized by large electricity energy consumption and complex system structure.Reducing building energy consumption has always been regarded as a key direction in the field of energy conservation.At the same time,the continuous increase in the number of buildings and people’s high requirements for a comfortable life make building energy conservation urgent.In order to achieve the purpose of building energy saving,efficient and accurate energy consumption prediction is an important prerequisite for taking relevant follow-up energy-saving measures.In this thesis,the current situation and existing problems of building energy consumption are analyzed,and the problems of traditional reinforcement learning algorithms in the application of building energy consumption prediction are slow convergence,low learning efficiency,and only suitable for small-scale state spaces.Based on this The above discusses a method based on meta-reinforcement learning.According to the Markov property of building energy consumption data,the meta-learning idea is introduced in the reinforcement learning,and then the model parameters are updated through the gradient descent method to train a high-precision system Model to improve the efficiency and accuracy of building energy consumption forecasting.The following two improved deep reinforcement learning algorithms are mainly proposed,the main contents of which are as follows:(1)In order to solve the problems of high computational time cost and low learning efficiency of traditional reinforcement learning algorithms,a research on DQN(Deep Q-Network)algorithm based on meta-learning is proposed,which uses the MAML(Model-Agnostic Meta-Learning)framework algorithm.The idea of meta-learning is introduced in the learning process of the reinforcement learning algorithm,which solves the problems of low learning efficiency and over-fitting caused by insufficient training sample size.At the same time,the proposed new algorithm uses the intrinsic reward function to train to obtain appropriate model parameters,so that the agent In the learning process,only a small amount of training data can achieve a better convergence effect,so as to improve the learning rate and accuracy of the overall algorithm.The proposed algorithm is used in N-way K-shot and Grid world problems.Experiments prove that the new algorithm has better convergence than the original DQN algorithm.(2)In order to gradually improve the learning efficiency of the algorithm on the training samples,on the basis of the Meta-DQN algorithm framework,the proximal policy optimization(PPO)algorithm is introduced to optimize and expand,in which importance sampling is adopted,and the sampling samples An importance evaluation is carried out to determine the degree of influence of the sample on the learning task,which improves the agent’s exploration efficiency of the strategy.On this basis,an advantage function is added,which can increase the high-quality actions when the agent interacts with the environment in the future.The number of occurrences improves the effective learning ability of the agent,and finally adds OU action noise in order to increase the exploration intensity of the agent.(3)Today’s society is developing rapidly,advanced intelligent furniture equipment is becoming more and more popular,and the issue of building energy consumption has also received extensive attention.Due to insufficient analysis of existing building energy consumption data and key factors affecting consumption,as well as insufficient experimental sample size,the prediction accuracy will not meet actual needs.Meta-reinforcement learning algorithm and PPO algorithm are used to optimize model parameters to achieve rapid adaptation of the algorithm to the new environment,and large-scale office buildings are used as the research object,and building energy consumption data is used as network input for parameter learning to improve the accuracy of the prediction model,To provide a reference for subsequent power dispatch,and to achieve the purpose of building energy saving. |