| In electronic circuit design,circuit components need to be connected through signal circuit routing.But with the development of industry,the variety of electronic circuit components has increased,so the number of pins that need to be connected.At the same time,the layout and routing of components and wire also need to meet the design rules of the actual circuit,which makes the routing problem more complicated.The performance of traditional routing algorithms is not satisfactory,and Global Routing has always been a historical challenge and valuable research problem.In order to solve the sparse reward problem of reinforcement learning routing,this thesis proposes an environment reward scheme based on diffused potential energy routing environment,and designs four kinds of reinforcement learning routing algorithms based on environment reward of diffused potential energy routing environment.At the same time,experiments were carried out in multiple routing environments of PCB 2D Escape Routing and IC 3D Global Routing.Compared with traditional routing algorithms,the reinforcement learning routing algorithm based on diffusion potential energy environment reward has improved the performance of such indicators as completion rate and routing length compared with the traditional routing algorithm.Main work of this thesis:In this thesis,a potential energy construction algorithm is proposed to shape the state potential energy in the routing environment,and the reward function of the routing environment is designed based on the change of the potential energy before and after the state transition of the routing environment.This solves the sparse reward problem in reinforcement learning routing problem and can guide the agent to complete the wiring task faster and better.The routing environment of PCB 2D Escape Routing problem and IC 3D Global Routing problem were modeled respectively,and four reinforcement learning routing algorithms,DQN routing algorithm,PG routing algorithm,MCTS routing algorithm and line-driven hierarchy routing algorithm,based on diffusion potential energy routing environment reward were designed for the actual routing problem.At the same time,the model structure of the policy network and the selection process of the policy action between the agent and the environment are improved,so that the model is more suitable for the routing environment and can improve the learning efficiency of the reinforcement learning algorithm.Multiple routing environments of PCB 2D Escape routing and IC 3D Global Routing are used in the experimental simulation,and the above four reinforcement learning algorithms based on diffusion potential energy routing environment reward are adopted.The experimental results are as follows: In the problem of PCB 2D Escape Routing,the four reinforcement learning routing algorithms perform well,compared with A~* routing algorithm,the average complete routing rate index is improved by 0.35,0.44,0.55 and 0.61 respectively.In the more complex IC 3D Global Routing problem,DQN performs poorly.And the other three reinforcement learning routing algorithm can complete all the routing tasks,compared with A~* routing algorithm,the average total routing length index is improved by 0.047,0.043 and 0.057,respectively.Routing Sequence-driven Hierarchical routing algorithm can explore better routing sequence and always has certain advantages in the total length of routing.However,MCTS routing algorithm has better time performance when there is little difference in routing performance.To sum up,this thesis proposes a diffusion potential energy construction algorithm for constructing environmental state potential energy of routing,which is used to solve the sparse reward problem in reinforcement learning routing.Through the design of four reinforcement learning routing algorithms based on diffusion potential energy environment reward,the performance of reinforcement learning routing algorithm in PCB 2D Escape Routing problem and IC 3D Global Routing problem can be improved. |