Font Size: a A A

Research Of Throwing Strategy Of Curling Contest Based On Reinforcement Learning

Posted on:2019-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:W ShaoFull Text:PDF
GTID:2417330566998084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of reinforcement learning,more and more reinforcement learning algorithms have emerged.For the discrete state space environment,there have been many mature reinforcement learning methods,and they have gradually been applied to artificial agents of all walks of life.However,in the field of continuous state space,the ability to reinforcement learning is still stretched.There are only some theoretical studies and no practical tests have been conducted.This topic focuses on the curling site environment,and studies the strategy generation method in the continuous state space.It attempts to generate the curling strategy of curling through the reinforcement learning algorithm,and combines the search algorithm to explore the curling strategy.In the curling site environment,both the state and the action space are located in a continuous space,and there are multidimensional free variables.It is difficult to implement the classical reinforcement learning algorithm in this scenario.In addition,there are many uncertainties in the curling site environment.The throwing strategy will produce errors when it is executed,and it will deviate from the original trajectory,which also poses a great challenge to the ability of the strategy generation algorithm.In this paper,we use a variety of methods to study curling strategy generation methods.The main research contents are as follows:(1)The construction of the curling simulation platform.First,we need to transform the curling game scene into a reasonable dynamic model.Converting the actual scene into a dynamic model not only requires a rational design of the system state and action,but also considers the impact of the throwing error on the scene and the algorithm.Followed by the curling simulation counter platform design.The front of the curling simulation countermeasure platform is used to receive user input,visualize the mathematical model of the design and visualize the curling.Finally,the architecture of the curling simulation countermeasure platform is completed.In the background,the taxiing process and the collision process in the throwing process are recorded in the form of data,and functions such as playback and revocation can be executed.The curling simulation countermeasure platform is the necessary basis for the curling strategy generation,and provides massive data reference and support for the curling strategy generation.(2)Design curling strategy generation algorithm.Firstly,the PSO particle swarm optimization algorithm should be optimized reasonably and adjusted to the appropriate parameters to ensure that it can generate reliable throwing strategies within a limited time.Second,it tries to combine the Monte Carlo tree with the supervised learning network to explore the generation pattern of the throwing strategy.Finally,the four elements of the structure of the reinforcement learning algorithm are designed: strategy,reward function,action value function and environment mathematical model.Only by establishing a suitable mathematical model and designing a reasonable reward function,the computer can train to obtain the optimal strategy through reinforcement learning algorithm.(3)Quantitative analysis of curling strategy.Athletes of various countries have developed a number of curling strategies based on competition experience,which can be compared with the throwing strategies generated by reinforcement learning to learn from each other and improve each other.It is not only appropriate to modify the reinforcement learning algorithm through the existing game strategy experience,but also can provide the athlete with the game strategy generated by the reinforcement learning algorithm as a game reference.
Keywords/Search Tags:Reinforcement learning, Continuous state space, Curling strategy, Unknown environment
PDF Full Text Request
Related items