Since the concept of artificial intelligence has been proposed,the machine game has been one of its most challenging research directions.Machine game can be divided into perfect information machine game and imperfect information machine game.The characteristic of imperfect information machine game is that agent can’t get all information in the game process.Many real-world decision-making problems can be abstracted as imperfect information game problems,such as airport planning,network security,financial energy and other problems.Therefore,it is great practical significance to study the imperfect information machine game.The traditional method of solving the imperfect game machine game problem is partially observed Markov decision process model and reinforcement learning algorithm.However,the reinforcement learning algorithm can’t guarantee convergence in imperfect state and high latitude state space.Only through limited data and repeated testing can’t traverse all the state.In this paper,deep reinforcement learning algorithm is used to solve the game of imperfect information machine,and the state-action value function in reinforcement learning is replaced by a deep learning network.Aiming at the problem that historical information can ’t be considered in the decision-making process of deep reinforcement learning algorithm,we propose to add the long-short term memory model to the deep reinforcement learning algorithm.In this paper,a reward function based on Monte Carlo tree search is pro posed.By comparing the return of the game and expected reward of the Monte Carlo game tree search,we can judge whether the agent should be rewarded or be punished.Traditional methods need to extract features manually.It is difficult to find the internal relations between features.Besides,training requires a lot of domain knowledge,which makes poorly scalability.This paper proposes a poker modeling method,which is suitable for pattern matching algorithms such as deep reinforcement learning.This coding method can apply the same network structure to different poker games with very little domain knowledge.Finally,this paper applies the improved deep reinforcement learning algorithm to the Texas poker game system.Learning from end-to-end avoids the complex process of extracting features manually.Comparing with the traditional reinforcement learning algorithm,it can achieve a higher level of intelligent.Improved deep reinforcement learning can provide a feasible method for the realization of large-scale machine game system and provide the possibility for extending to real life. |