Font Size: a A A

Research And Application Of Imperfect Game Strategy Based On UCT Algorithm And Deep Reinforcement Learning

Posted on:2022-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:H H ShenFull Text:PDF
GTID:2480306542963429Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Computer game plays an important role in the artificial intelligence area,which is a challenging research direction.In recent years,imperfect information game has become an important touchstone to test the level of artificial intelligence.There are many imperfect information game scenarios in real-world,such as economic transactions,military games,automatic driving.Therefore,the study of imperfect information game problem has very important practical significance.Guandan is a type of imperfect information card game with four players which are divided into two teams.It has the characteristics of randomness and information hiding.There are two wild cards in each game.It causes the huge increase for decision nodes number.The traditional game tree search algorithm has the inaccurate information prediction,redundant nodes and low search efficiency problems.It is not suitable for the Guandan game.The Upper Confidence Bound Apply to Tree(UCT)algorithm expands the game tree dynamically in an asymmetric form,which is very suitable for large game tree scale.In this thesis,UCT algorithm is improved according to the characteristics of Guandan game.On the one hand,the hand splitting algorithm is designed to expand the game tree.On the other hand,the game tree nodes are optimized in parallel,and the parallel UCT based on hand splitting algorithm is proposed.Although the tree search algorithm can make full use of the computing power,the mass hidden information in the Guandan game leads to a high-dimensional game state,which makes the tree search algorithm limited in intelligence level of Guandan.Reinforcement learning algorithm has efficient strategy search ability,but it cannot converge under the condition of imperfect information and high-dimensional state space.According to these problems,this thesis introduces the Proximal Policy Optimization(PPO)algorithm based on deep reinforcement learning to solve the problem of imperfect information,high-dimensional state space and action space.It enables the agent to perceive high-dimensional information and makes decisions according to the acquisition information.The innovations of this thesis are depicted as follows:1.In view of inaccurate information prediction,redundant nodes and low search efficiency problems,this thesis designs the information prediction module to predict the hidden information in the game and proposes the hand splitting algorithm to expand the game tree.It can reduce the size of the game tree effectively.Parallel processing of nodes can improve the efficiency of tree search.Experimental results show that the parallel UCT based on hand splitting algorithm has higher search efficiency and intelligence level.2.In view of high-dimensional action and state space problems,this thesis studies and designs a network based on the Proximal Policy Optimization algorithm.Firstly,the696-dimension feature vectors are extracted from the game and the neural network structure is designed.Secondly,the self-play process of sample collection and network training is designed.Finally,the experiment result show that the decision model based on the Proximal Policy Optimization algorithm is better than the intelligence level of Policy Gradient algorithm and A2 C algorithm.3.This thesis designs and implements Guandan game system based on the study of the game strategy.It includes the deep reinforcement learning training subsystem and the human-computer interaction game subsystem.The training subsystem is used to manage the sample collection and self-play process of the neural network.The human-computer interaction game subsystem can provide strategy selection,human-human,human-computer and computer-computer automatic game functions.The game system can store the entire game record.In summary,this thesis studies research in two aspects: search technology and deep reinforcement learning.Firstly,the UCT algorithm used in the Guandan game are improved and parallelized,which enhances the search efficiency and intelligence level;Secondly,we design the network model of the Guandan game based PPO algorithm and its training process;the intelligence level of the PPO algorithm is higher than the parallel UCT algorithm based on hand splitting according to the experiment results;Finally,we establish the algorithm design,analysis,and verification experimental platform for the Guandan game strategy research.
Keywords/Search Tags:Imperfect Information Game, Guandan, Deep Reinforcement Learning, Proximal Policy Optimization Algorithm, UCT Algorithm
PDF Full Text Request
Related items