| It has become a suitable thing that the deep learning algorithm has been integrated into the game image processing of reinforcement learning,and it has also made the study of image games more in-depth.However,the use of deep reinforcement learning algorithms to process image games is also gradually progressing.The initial algorithm is a Markov decision process and Q-learning algorithm in reinforcement learning,together with a deep-learning convolutional neural network,and a random experience replay algorithm to form a deep Q-learning random experience replay algorithm.Later,a deep Q-learning prioritized experience replay algorithm has evolved.That is,some replay units that are deemed important have been taken out for learning more and those that are not important have reduced the probability of learning.We can now achieve better results than previous algorithms by improving the algorithm.According to algorithm analysis that some experience has a greater effect on the training of parameters than other experiences.The previous algorithm is still not good enough to meet the requirements for high-level achievement of agent learning image games.This paper proposes an advanced deep Q learning prioritized experience replay algorithm.By changing the priority-to-probability mapping function and comparing the previous mapping algorithm with a simple mapping function,we have found a new mapping function that have a higher probability of replaying important priority replay units.The new mapping function allows the agent to learn the optimal game strategies and effectively improve game performance.In the experiment,an intuitive model strategy analysis of the improved algorithm is first performed.Then through various algorithms CNN network layer architecture selection,cost function analysis,efficiency analysis and game score comparison of each algorithm.Finally,through the test results,it can be proved that the new algorithm in this paper can make the agent make more effective decisions in the image game and win the goal of higher score and less time. |