Research On Inplementation Of Artifacial Intelligence Algorithm In Texas Holdem Based On Monte Carlo Tree Search

Posted on:2021-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:F Jiang

Full Text:PDF

GTID:2480306557485574

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The many successes of Monte Carlo Tree Search in perfect information games have demonstrated its extraordinary abilities in exploration and learning as well as the need to conduct a selective search and planning in complex environment.Although part of these successful cases have been extended to the field of imperfect information games,so far they have not achieved the same level of practical performance or theoretical convergence guarantees as those prominent game-theoretical approaches.In large scale environment such as Texas Hold'em,the refinement of MCTS faces more challenge,with few successful applications in the literature.On the one hand,MCTS algorithms that search locally can't converge to Nash Equilibrium.On the other hand,MCTS algorithms that search globally heavily depend on the use of abstraction technique and consume a lot of memory resources.In order to solve the problem that local MCTS algorithms deviate from Nash Equilibrium,this thesis analyzes the causes of this deviation,and proposes an improved policy-value network based Monte Carlo Tree Search algorithm(PV-MCTS),which introduces global information and correct policy deviation by adding the realization probability of root state in the sub-game.On this basis,this thesis further introduce the notion of fictitious play and present policy-value network based Monte Carlo fictitious play algorithm(PV-MCFP).By letting agents best respond to the average strategy of other agents,it has gained excellent theoretical convergence guarantee and practical performances.In comparison with local search and online planning,global search has a faster training speed,but in the meantime it occupies more memory and require the use of abstraction technique which combines human knowledge.This thesis present value network based self-play UCT algorithm(VN-SPUCT),which trains a deep value network to provide value estimate for UCT and thus mitigate the impact of abstraction on the state value update.Meanwhile,value network has certain abilities of generalization.It can avoid time being wasted on the simulations of similar states and improve the learning efficiency.Finally,the performances of PV-MCFP and VN-SPUCT are tested in several poker game environments under different scales.Experiment results indicate that PV-MCFP can learn an approximate Nash Equilibrium without using any human knowledge and occupying an amount of memory resources.Although abstraction technique is used in VN-SPUCT,it achieves better performance and converge faster compared to other algorithms in large-scale environment.In the six-player No-limit Texas Hold'em competition of 2018 Annual Computer Poker Competition(ACPC),VN-SPUCT algorithm achieved a rank of fifth among the contestants.

Keywords/Search Tags:

Imperfect information games, Reinforcement Learning, Monte Carlo Tree Search, Self-play

PDF Full Text Request

Related items

1	Research On Inplementation Of Artifacial Intelligence Algorithm In Texas Holdem Based On Monte Carlo Tree Search
2	Research And Realization Of Complete Information Game Theory Based On Reinforcement Learning
3	Research On Knowledge Graph Completion Model Combining Temporal Convolutional Network And Monte Carlo Tree Search
4	Research And Application Of Incomplete Information Game Algorithm Based On Reinforcement Learning And Game Tree Search
5	Research And Application Of Imperfect Information Game Decision Based On Knowledge And Game-tree Search
6	Research And Application Of Imperfect Game Strategy Based On UCT Algorithm And Deep Reinforcement Learning
7	Research On Differential Games Of Air Combat Based On Reinforcement Learning
8	Modeling CGF Tactical Decision Making Through Monte Carlo Tree Search
9	Go Artificial Intelligence Using Monte Carlo Tree Search Methods
10	Phylogeny Inference Based On Autoencoder And Monte Carlo Tree