| Combinatorial optimization is an important problem in computer science and operations research,and it is widely used in logistics,production scheduling,resource allocation,and social networks.Because this problem is important in theoretical study and practical application,it has been widely studied by many researchers.The traveling salesman problem is a typical NP-hard problem in the field of combinatorial optimization.The classic traveling salesman problem considers a certain set of customers and knows travel distances between customers.The goal of problem is to find a shortest route that starts from the depot,visits each customer only once,and finally returns to the depot.The dynamic traveling salesman problem with stochastic cases is a more general problem.Based on the classic problem,stochastic factors that affect route planning are added,such as dynamic travel time and dynamic customer demand.In fact,the dynamic and stochastic traveling salesman problems are more useful for practical applications,so it is more valuable to study the dynamic cases and practical applications of the dynamic problems than the classic static problem.In this thesis,a model based on reinforcement learning is proposed and used to solve the dynamic traveling salesman problem with stochastic customers.On the basis of the standard traveling salesman problem,this thesis assumes that the demands of customers are dynamically changing.Each time only a part of customers has delivery demand,and the planned vehicle route only needs to visit the customers with demands and minimize the route cost.This problem model can be used to describe the daily delivery problems in logistics enterprises.In this problem,the customers are relatively fixed,but the set of customers that need to be delivered every day is different from other days,and the route planning algorithm needs to solve similar problems repeatedly.The vehicle route planning model is mainly composed of two parts: the first part is based on the solution data of the heuristic algorithm on each stochastic node problem,and a segment clustering algorithm is used to divide all nodes into several clusters,which serve as the basis of state representation and action selection in reinforcement learning.The second part uses a Q-Learning algorithm to train the model based on randomly generated dynamic cases.The Res Ne Xt network architecture combined with the spatial attention mechanism is used as a model decision-making network,which is used to determine the visit sequence between clusters,and the sequence of nodes to be visited within the cluster is optimized by the2-OPT algorithm,and finally the solution sequence of the dynamic case is obtained.We call the proposed model architecture DTSPSC-RL.After training the model,DTSPSC-RL can repeatedly solve the traveling salesman problem with varying number of nodes without retraining the model for each problem instance,thus effectively solving the daily distribution problems of logistics enterprises.Based on the above model,we have given corresponding solutions from two aspects of offline decision and online decision.Offline decision refers to the method of calculating the solution before execution the plan.The set of customers that need to be served is known before the departure of the vehicle,and the visit route is planned based on DTSPSC-RL.Online decision refers to the method of calculating a new solution immediately when a dynamic event occurs.Some customers who have demands are known before the vehicle departs,and the demands of other customers are submitted during the vehicle delivery process.On the basis of DTSPSC-RL,this thesis proposes an online method of calculating routes to solve the dynamic traveling salesman problem with stochastic customers in real time. |