Font Size: a A A

Research On Inplementation Of Artifacial Intelligence Algorithm In Texas Holdem Based On Monte Carlo Tree Search

Posted on:2021-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2480306557485574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The many successes of Monte Carlo Tree Search in perfect information games have demonstrated its extraordinary abilities in exploration and learning as well as the need to conduct a selective search and planning in complex environment.Although part of these successful cases have been extended to the field of imperfect information games,so far they have not achieved the same level of practical performance or theoretical convergence guarantees as those prominent game-theoretical approaches.In large scale environment such as Texas Hold'em,the refinement of MCTS faces more challenge,with few successful applications in the literature.On the one hand,MCTS algorithms that search locally can't converge to Nash Equilibrium.On the other hand,MCTS algorithms that search globally heavily depend on the use of abstraction technique and consume a lot of memory resources.In order to solve the problem that local MCTS algorithms deviate from Nash Equilibrium,this thesis analyzes the causes of this deviation,and proposes an improved policy-value network based Monte Carlo Tree Search algorithm(PV-MCTS),which introduces global information and correct policy deviation by adding the realization probability of root state in the sub-game.On this basis,this thesis further introduce the notion of fictitious play and present policy-value network based Monte Carlo fictitious play algorithm(PV-MCFP).By letting agents best respond to the average strategy of other agents,it has gained excellent theoretical convergence guarantee and practical performances.In comparison with local search and online planning,global search has a faster training speed,but in the meantime it occupies more memory and require the use of abstraction technique which combines human knowledge.This thesis present value network based self-play UCT algorithm(VN-SPUCT),which trains a deep value network to provide value estimate for UCT and thus mitigate the impact of abstraction on the state value update.Meanwhile,value network has certain abilities of generalization.It can avoid time being wasted on the simulations of similar states and improve the learning efficiency.Finally,the performances of PV-MCFP and VN-SPUCT are tested in several poker game environments under different scales.Experiment results indicate that PV-MCFP can learn an approximate Nash Equilibrium without using any human knowledge and occupying an amount of memory resources.Although abstraction technique is used in VN-SPUCT,it achieves better performance and converge faster compared to other algorithms in large-scale environment.In the six-player No-limit Texas Hold'em competition of 2018 Annual Computer Poker Competition(ACPC),VN-SPUCT algorithm achieved a rank of fifth among the contestants.
Keywords/Search Tags:Imperfect information games, Reinforcement Learning, Monte Carlo Tree Search, Self-play
PDF Full Text Request
Related items