| As one of the most challenging Artificial Intelligence(AI)problems,computer game has attracted much attention.The computer game is divided into perfect and imperfect information games.In recent years,the perfect information game has developed rapidly.The AlphaZero launched by DeepMind has perfectly defeated the top human players and has become the world’s top Go program with strong self-learning ability.Among them,the application of related technologies such as deep learning and reinforcement learning has provided new solutions to the perfect information game,making the research on perfect information game more mature.However,the game environment of most problems in the real world is not perfectly known,and multiple game parties need to make decisions with imperfect information.At the same time,the state space of decision-making is high-dimensional.In the early days,many researchers hoped to reduce the state space of decision-making to effectively analyze and sample the imperfect information environment,which converts the imperfect information game into the perfect information game.And then,they can use the existing perfect information-related algorithms to solve imperfect information problems.In 2019,Pluribus,a multiplayer Texas Hold’em program developed by Carnegie Mellon University,defeated professional players for the first time in a six-player no-limit Texas hold ’em tournament.This promoted the study of imperfect information game.However,some decision-making algorithms in Texas Hold’em cannot be directly applied to the bridge.The bridge game is not simply composed of the playing phase.There is also a bidding phase before the playing for players to communicate and compete.In a bridge game,the level of expert teams in the playing phase is generally indistinguishable,so the bidding stage is more able to decide whether to win or lose.So far,the problem of bridge bidding decision-making has not been perfectly solved.At the same time,the state space of bridge bidding decision-making is huge,which has many similarities with human decision-making.Hence,the research on bridge bidding decision-making has practical significance.The research of this paper will focus on the research of bidding decisionmaking combining the characteristics of bridge.The research work has achieved the following results:1.Existing reinforcement learning algorithms have difficulty converging when the state space is huge.We design a deep reinforcement learning algorithm combining global information sharing with expert experience to realize bridge bidding decisions.The method first models the real bridge bidding process and trains a bidding system model based on the recurrent neural networks through supervised learning.The bidding system model can initially make bidding decisions that conform to the system’s rules.Then,based on the bidding system model,a quick model that can generate bidding sequences is constructed,replacing the state evaluation function in the deep reinforcement learning.Finally,the model obtains more bidding strategies through self-play exploration.2.The large state space in imperfect information game makes building a large-scale game tree difficult.We design a model that combines Monte Carlo tree search and deep neural networks.First,an LSTM-based policy prediction model is designed to replace the default policy in the Monte Carlo tree search.The model can get candidate nodes,extend them,and predict the distribution of unknown hands.Then,a hands sampling algorithm is designed combined with the distribution of hands.Through appropriate and efficient sampling of hand cards,imperfect information is converted into relatively perfect information as possible,thereby compressing the search space and reducing the scale of the game tree.Finally,a double-dummy model based on a fully connected neural network is designed as the evaluation function,and the decision is made according to the evaluation result.Simulation experiments show that the proposed method effectively improves the bidding decision-making ability.3.In the existing research on bridge,the bidding system is vague and single.We designed a double neural network synergistic bridge bidding decision model.This model realizes that the agent has the ability of multi-system bidding.At the same time,it can quantify the bidding system,thus solving the problem of system ambiguity to a certain extent.First,we use expert experience,which can effectively avoid the influence of the historical bidding sequence.Moreover,the general feature information extraction algorithm converted the historical bidding sequence into 30 general features and then directly learned and train these features.A bidding selection network that can have different system capabilities is developed.At the same time,a state evaluation network based on deep reinforcement learning is trained to directly evaluate the situation and make decisions based on the evaluation results.Competing against models of two different systems shows that the scheme can support multi-system matchmaking. |