| At the beginning of 2019,State Grid Corporation of China creatively proposed the strategic goal of “Becoming a world-class energy internet company with 3 functions and 2 networks”,which indicates that China’s electric power market reform has entered a new stage,and the construction of a Ubiquitous Electric Internet of Things(UEIOT)is the core task.Its ojective is to enable comprehensive sensing and inter-connection of everything in electricity system,establish a smart energy trading service platform and form a complete and information-sharing energy ecosystem.The supply-side platform(SSP)and demand-side platform(DSP)of the deregulated electricity market can realize real time online energy trading on the energy trading,thus how to bid strategically to maximize long-term revenue has raised great attention of many relevant interest entities and auxiliary service provider.Reinforcement learning is an appropriate paradigm used to maximize the future long-term rewards,so the thesis deeply investigates the application of reinforcement learning algorithms(especially,Mulit-armed Bandit model)into the SSP bidding auction of energy trading in energy Internet.The main work of the thesis are as follows:(1)The thesis firstly summarizes and categorizes various multi-armed bandit(MAB)algorithms.Then the thesis explains the use of MAB algorithms in repeated auctions,then studies the adaption of MAB algorithms into power load bidding of SSP,demand response of DSP,and bidding of virtual trading in deregulated electricity markets,,etc.Then,the conclusion can drawn that that MAB algorithms can be effectively used into strategic bidding of online energy trades in energy internet.(2)The thesis then investigates the real-time bidding issue of SSP in the power market.In the spot power market,due to lack of oppents’ information,a power generating company(PGC)has to strategically make its bids based on its own production costs and observed information about markets prices,in order to maximize its own long-term revenue.Therefore,considering the dynamics and uncertainty of the power market,this thesis models the strategic bidding behavior of a PGC as an adversarial multi-armed bandit problem.The thesis proposes a bidding algorithm exponential-weight for exploration and exploitation with continuous value,which is named after Exp3 C.Exp3C can determine biding prices on continuous ranges,and can iteratively optimize its bidding values by observing the reward feedback.Theoretical analysis shows the upper bound of average Exp3 C regret per round follows,where T is the number of total rounds.Moerover,The thesis evaluates the performance of Exp3 C empirically in the context of power bidding of SSP by using historical data from PJM,and the results show that Exp3 C outperforms other online learning based heuristic methods in terms of cumulative profits and cumulative regrets.Finally,the thesis theoretically proves that if each PGC in the power market adopts Exp3 C to determine its bidding price,its bids will converge to Nash equilibrium of one-shot game,and eventually the whole market will reach Nash equilibrium.(3)In order to promote price convergence between day-ahead(DA)and real-time(RT)markets,the foreign electrical market introduces virtual trading mechanism,which allows market participants to buy(or sell)energy in the DA market with the obligation to sell(or buy)the same amount of energy in the RT market.Market participants don’t need to generate or consume electricity and they can arbitrage on the differences between DA and RT prices.For this scenario,this thesis proposes a bidding strategy for virtual trading.Assuming that there have exists several bidding strategies,the thesis proposes an adaptive bidding strategy using expert advice through integrating these existing strategies.The proposed algorithm is exponential-weight for exploration and exploitation with continuous value with expert advice,which is also named after Exp4 C.Exp4C is a contextual multi-arm bandit algorithm and it can comprehensively use the "expert suggestions" given by each strategy to make decisions.The thesis evaluates the performance of Exp4 C empirically in the context of virtual trading in wholesale electricity markets by using historical data from PJM,and the results show that Exp4 C works better than any single existing benchmark method by comparing their cumulative profits and cumulative regrets. |