| In recent years,deep reinforcement learning technology has developed rapidly,with very specific applications in robotics,natural language processing,autonomous driving,intelligent transportation and other fields.In the gaming field,deep reinforcement learning has even shown extraordinary potential,for example,Google Deep Mind’s Alpha GO Zero beat Ke Jie,who was ranked No.1 in the world at the time,in the Go project,and Open AI beat the human world champion OG team in the DOTA2 project.Although deep reinforcement learning has made breakthroughs and innovations in the field of gaming,relatively little research has been done on racing games,which are noncomplete information games with a huge state space,high complexity,and rapidly changing situations.Traditional racing AI approaches rely on expert knowledge,which not only increases design costs,but also makes generalization difficult.Maintenance of program-driven game-based AI is more difficult and lacks resilience.Training racing AI using deep reinforcement learning methods also faces problems such as sparse rewards,the need for expert knowledge and additional rewards.Reinforcement learning faces a complex and large state space,and the algorithms usually require a large number of samples,which leads to high time cost and hardware cost.To address these problems,this paper develops the following three areas of work.(1)A reinforcement learning model based on generative adversarial imitation learning and a shared generator for proximal policy optimization is proposed.By combining imitation learning and deep reinforcement learning,we propose a reinforcement learning model based on generative adversarial imitation learning and a shared generator for proximal policy optimization,which not only improves the training convergence rate,but also ensures that more optimal solutions can be explored by the intelligences.In order to make full use of the expert data,this paper also introduces a behavioral cloning algorithm to further improve the convergence rate in the early stage of training.(2)A reward function design scheme based on voxel path planning is proposed.The track features are extracted automatically using an octree voxel partitioning algorithm,the track is abstracted into voxel form,and intermediate reward points are set automatically by a navigation algorithm to guide the car to learn the correct driving route.The scheme does not require human intervention and effectively solves the problems of sparse intelligent voxel rewards and poor generalization.Also,the method has good scalability and applicability and can be applied to various types of racing games and other types of reinforcement learning tasks.(3)A karting mini-game was developed using Unity,customizing different types of tracks,and building an intelligent body training model based on the game.Through experimental analysis,the algorithm proposed in this paper achieves better experimental results,and the average scores of the karting intelligences all exceed those of human players,which are significantly ahead of the existing algorithms in terms of convergence speed,while being applicable to different scenarios and having strong generalization. |