There are many successful solutions of perfect information games. Agents create part of the game tree starting at the current position, evaluates the leaves of this partial tree using an evaluation funcion, and then does a search of this tree to determine the optimal move from the root. This method is the core of many game-playing programs. The solutions of games with perfect information and those with imperfect information are very different. The techniques that suit for some do not usually apply to the others. In games of imperfect information, agents have only partial knowledge about the current state. In the case of unknown information, random strategy become a feasible choice. In the situation of random strategy, the accuracy of the valuation request higher.Static evaluation methods are widely adapted in game systems. Static valuation method requires designer have more understanding of game method and then judge the important degree (score) of feature in the situations, corresponding, evaluate the whole situation accurately. But facing thousands of situation, it is impossible to judge situation accuratlly, especially in the beging of the game. What's more, to store a large number of game state, needs large storage and fast search algorithm. Therefore, this paper adopts Q-learning method to solve the static valuations function problems.Q-learning is a kind of Reinforcement Learning method. The conventional prediction-learning methods adjust itself by means of the difference between predicted and actual outcomes whereas Q-learning adjusts itself by means of the difference between temporal successive predictions. For most real-world prediction problems, Q-learning require less memory and less computation than conventional methods. They produce more accurate predictions.This paper studies imperfect information games based on Q-learning. slow convergence speed and easy result locally are two characteristics. We combined temporal difference prediction and simulated annealing algorithm, to accelerate the convergence and to explore the optimal result. A self-study ability of inferfect information games system is relized. The main contributions and innovations are as follows:1. According to slow convergence speed of Q-learning, combine the temporal differential prediction and dynamicly adjust parameters to learn, to promote information collection early and to speed up learning speed, and convergence speed laterly;2. Introduce Simulated Annealing Metropolis, the non-optimal solution is explorate and study the optimal result;3. Realize a self-study imperfect information games system based on Q-learning which can adjust its behavior and gain useful knowledge from the outside environment.4. Develop SiGuoJunQi testing system based on Tencent game lobby, and generate mass chess. Also establish database of chessboard and research on early and late stage of game. |