| Game problems have always been a research focus in the field of artificial intelligence.According to whether the game player can obtain all the situation information,the game can be divided into complete information game and incomplete information game.Compared with complete information games,incomplete information games are more difficult due to the existence of hidden information,which makes it impossible for game participants to observe all the situation information,and therefore there are more complex possibilities.With the development of computer hardware and deep learning technology,the research on many complete information game problems has made great progress.Researchers have gradually shifted their attention to the field of incomplete information game.The game with incomplete information has higher game complexity,so it is difficult to apply the traditional method to the game with incomplete information directly.In this paper,based on the problem of imperfect information game(mahjong),a reward model and effective card number estimation model are constructed by using deep learning technology and supervised learning theory,which can be used to reinforce the reward task of learning card decision model.Specific research work is as follows:1.Construct an efficient feature segmentation method based on mahjong domain knowledge,and design a multi-channel coding method of full state information.The semantic analysis and feature segmentation of scene information in game process are carried out.For different types of features,different coding methods are used to get the image-like feature matrix.This feature design and coding method only in the process of feature design requires certain knowledge of mahjong domain.In the coding process,the complex feature combination in the traditional feature engineering is not needed.Experimental results show that this feature design and coding method can effectively extract the scene information of mahjong game.2.Proposed a CNN+double GRU multi-branch reward model combined with attention mechanism.The semantic features of scene information can be learned by using the efficient feature extraction and representation of CNN structure.By using the historical information memory ability of GRU structure,the internal relationship between the actions of game participants in different rounds is studied.Based on the local influence of attention mechanism on scene information in different rounds,the reward model can evaluate scene information.The reward model is essentially a state evaluation model,and the difference between the evaluation values of adjacent states is the reward of the card decision.Experiments show that using the reward model instead of the artificially designed reward function can effectively solve the sparse reward problem in the game with incomplete information,and improve the game level of the card decision agent.3.A ResNet101+double GRU network architecture was proposed to construct the estimation model of the number of effective cards of players,and then the additional reward query table was designed to generate additional rewards of the reinforcement learning card model.Experiments show that this method can effectively balance the luck component of card handling in the game process,guide the training and optimization of the card decision agent,and improve the comprehensive game level of the agent.4.The empirical playback mechanism of mahjong game information is designed.In the process of mahjong game,game information is stored in JSON format.According to the game data screening index,screening of high quality data,the iterative training of reward model and effective card number estimation model,constantly improve the decision-making level of the agent out of the card. |