| With the continuous advancement of the spot electricity market,the interaction between power supply and demand is becoming more frequent,and the participants in demand response are becoming more numerous.As far as demand response publishers are concerned,electricity retailers also need to issue demand responses in addition to grid companies.Because the decision of the electricity retailers usually have a long-term impact on the behavior of users,it is necessary to study a method that can maximize the long-term revenue of the electricity retailers.This article uses a reinforcement learning algorithm that is suitable for solving the sequential decision problem of demand response,and aims at the dimensional disasters caused by the large state and action space in the demand response of the electricity retailers and the users,and the random deviation between the actual scene and the training scene Problem,improved the corresponding algorithm,and carried out simulation experiments to verify.The specific research content and results are as follows:A multi-time scale demand-response interactive power consumption model for users and electricity retailers is established.Aiming at the characteristics of China’s spot market,the goal of maximizing long-term returns by seeking the optimal subsidy price for electricity retailers and maximizing the current demand response returns by reducing the load is determined.The profit function of electricity retailer participating in demand response is divided into saved electricity purchase cost,reduced electricity sale income and response compensation fee paid to users.The revenue function of user participation in demand response is divided into the response cost paid,reduced power purchase cost,and response subsidy obtained.Considering the impact of historical subsidy prices issued by electricity retailer on consumer comfort cost perception,a back-and-forth connection is established for the users comfort cost function,and a dynamically optimized revenue function for electricity retailer and users in interactive power consumption is established.The neural network-based reinforcement learning method is used to solve the problem of dimensional disasters in the electricity consumption problem between the electricity supplier and the user in the demand response interaction.The Q-learning reinforcement learning method based on value function is studied.The user’s demand response revenue function is used to determine its response load.The revenue function of the current demand response of the electricity retailer is converted into the immediate reward function in Q-learning.Aiming at the dimensional disaster problem caused by the traditional Q-learning method due to the large state and action space in the demand response problem,this paper proposes a method to approximate the Q-learning median function using BP neural network.It is proved in simulation experiments that the use of neural network-based reinforcement learning algorithms can effectively avoid the problem of dimensional disasters,and can obtain the strategy that maximizes the long-term revenue of electricity retailers.The multi-scenario reinforcement learning method is used to solve the problem of scenario differences in the electricity consumption problem between the electricity retailer and the users in the demand response interaction.In the demand response process,due to the random deviation between the actual scene and the training scene for reinforcement learning,this paper chooses to use Monte Carlo method for scene generation based on the neural network-based reinforcement learning method,and then strengthens each scene separately.Learning training,meanwhile,a method of comparing the actual scene with the training scene is proposed to select the training scene that is closest to the actual scene for strategy output.In simulation experiments,it has been proved that when there is a large fluctuation in user load,the use of multi-scenario reinforcement learning methods can effectively output strategies,so that electricity retailers can obtain higher long-term benefits in demand response. |