Research On Digital Curling Strategy Based On Reinforcement Learning

Posted on:2022-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:H K Zhao

Full Text:PDF

GTID:2507306572460314

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

With the 2022 Beijing Winter Olympics approaching,my country’s goal of "300million people on ice and snow" has been gradually realized.As one of the main sports on ice,curling not only tests the players’ throwing skills,but also has higher requirements on the players’ throwing strategies.Because our country’s curling team lacks the ability to use key strategies in key competitions,there is a certain gap with the world’s strong teams.The curling strategy studied in this article can be used as a reference for athletes to improve their tactics.Therefore,the research on curling strategy has certain practical significance.This paper investigates the current research status of curling strategies on the basis of clarifying the research significance and project background.Through theoretical analysis,simulation and experimental comparison,a digital curling strategy with strong performance is finally obtained.The design of the digital curling strategy in this paper is divided into two stages.First,training the curling Policy-Value network offline,and then combine the trained network to improve the online Monte Carlo Tree Search algorithm.In the absence of a curling strategy data set and curling professional guidance,in order to obtain a curling strategy,this paper uses Monte Carlo Tree Search self-play to generate data which uses train the Policy-Value network.Under the digital curling simulation model,the training process combines the Policy-Value network and the Monte Carlo Tree Search,the two steps guide and cooperate with each other.After multiple self-play games and training updates,the initial Policy-Value network with certain performance is obtained.Considering that the offline trained network cannot carry out long-term strategic thinking and cannot adapt to the changing game situation,this article combines the trained strategic value network to introduce online Monte Carlo Tree Search.Because curling moves in a large continuous space,and the output of the Policy-Value network is a discrete action,the normal distribution is introduced in this article to transform the discrete action space into a continuous action space.In addition,because the real curling movement has execution uncertainty,in order to be close to the real curling movement,a random factor is added in the digital curling simulation environment to make the expected landing point of curling differ from the actual landing point.Aiming at execution uncertainty,this paper introduces kernel regression and kernel density in online search to re-evaluate the value of a certain action output,and consider the impact of possible actual landing points on expected landing points in the evaluation process to reduce execution uncertainty and improve the performance of the strategy.Finally,the offline training Policy-Value network and the improved online Monte Carlo Tree Search algorithm are combined to obtain the final curling strategy method(PVN-MCTS)in this paper.In order to verify the effectiveness of this method,this paper conducted multiple sets of comparative experiments,and the results of the game show that the method used in this paper has strong strategic performance.

Keywords/Search Tags:

curling strategy, reinforcement learning, policy value network, monte carol tree search

PDF Full Text Request

Related items

1	Design And Implementation Of Gobang Algorithm Based On Monte Carlo Tree And Neural Network
2	Research Of Throwing Strategy Of Curling Contest Based On Reinforcement Learning
3	Design And Implementation Of Chinese Chess Self-game And Reinforcement Learning System
4	The Research Of Chinese Chess Based On Reinforcement Learning
5	Research And Application Of Explainability Algorithm In Graph Neural Networks
6	Reliability Analysis Of Non Repairable System Based On Dynamic Fault Tree
7	Digital Curling Rink System
8	Carol Ann Tomlinson Difference Between Teaching And Research
9	Study On Optimal Air Ticket Purchase Timing
10	Research On The Dynamic Changes Of Job Search Strategies And Their Effect On Job Search Outcomes