Font Size: a A A

Solution And Optimization Of TSP Problem On Reinforcement Pointer Network

Posted on:2020-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:K KangFull Text:PDF
GTID:2370330590983194Subject:Computer technology
Abstract/Summary:PDF Full Text Request
TSP,traveling salesman problem,is a typical NP-hard problem with important theoretical research significance and extensive practical application value.First,this paper introduces the foreign and domestic research background of TSP and deep reinforcement learning.Then,the paper describes the principle and design of the pointer network in detail.On the basis of analyzing the three defects of the pointer network,the following improvements are made,and the reinforcement pointer network model is proposed:First,we changed the original pointer networks based on supervised learning to reinforcement learning,and the policy gradient method was used to optimize the networks,which solved the defects of high acquisition cost of the training set and greatly improved the accuracy of the model.Second,the use of the actor-criticist structure greatly improves the convergence speed of the model.Third,the experience pool offline update strategy is adopted,which greatly improves the utilization of the sample and the parallelism of the algorithm.Fourth,the advantage function local update strategy is adopted to improve the stability of the algorithm.In the end,the paper compares the performance and speed of the reinforcement pointer network and the other best algorithms.The analysis results show that the reinforcement pointer network has unique advantages in many algorithms.At the same time,this paper also points out the shortcomings and improvement directions of the reinforcement pointer network,and prospects for the deep reinforcement learning to solve the combination optimization problem.
Keywords/Search Tags:Pointer Networks, Traveling Salesman Problem, Deep Reinforcement Learning, Reinforcement Pointer Networks, Actor-Critic
PDF Full Text Request
Related items