| At present,the research on process route planning is mainly limited to the process route planning for specific parts in a fixed processing environment,and it is difficult to quickly respond to the process route planning problems of dynamic changes in the processing environment and individualized product customization in flexible processing systems,and there is an urgent need to carry out research on process route planning methods for flexible processing systems.Considering that the action selection of Deep Reinforcement Learning(DRL)is similar to the selection of decision variables for process route planning,and that DRL can store previously learned strategies in the form of neural network parameters,the previously stored strategy knowledge can be used to improve the decision speed when dealing with problems with similar feature structures.Therefore,in this paper,a research on DRL-based process route planning method for flexible machining system is carried out to improve the response speed of process route planning for flexible machining system,and the main work is as follows:(1)A Deep Q Network(DQN)based process route planning method is proposed for the process route planning problem of dynamic changes in machining environment.Combining the meaning of process route planning,the part feature operation execution is mapped to a state vector,and the part feature operation set is mapped to an action space.At the same time,we propose an S-function based exploration mechanism to speed up the convergence of the algorithm for the choice of "exploration" and "utilization" of the DQN agent.In order to improve the utilization of reasonable experience by the agent and accelerate the speed of the agent to circumvent the part feature constraints,a weighted experience pooling technique is proposed.Simulation results show that the proposed method effectively solves the process route planning problem for dynamic changes in the machining environment.(2)Aiming at the problem of process route planning for part feature reconstruction caused by personalized customization requirements,considering the characteristics of Asynchronous Advantage Actor Critic(A3C),such as parallelism,asynchronism,and fast response,a process route planning method based on A3 C was proposed.The state vector,action space,and reward function are defined based on Markov Decision Process(MDP)in conjunction with the implications of part feature reconstruction.At the same time,a stochastic greedy strategy is proposed to avoid the A3 C agent from falling into local optimum in selecting machining resources for the part.And the fast-failure strategy is proposed for the problem that the agent may perform a lot of trial-and-error to reduce the response speed of the algorithm when the part undergoes feature reconstruction.Simulation experiments prove that the proposed method can effectively solve the process route planning problem when the part undergoes feature reconstruction.(3)A process route planning method based on Distributed Proximal Policy Optimization(DPPO)is proposed for the process route planning problem in which dynamic changes in the machining environment and reconfiguration of part features occur simultaneously.The nodes and edges in the graph structure can be generated dynamically to overcome the dynamic changes of the machining environment and the reconfiguration of the part features that affect the description of the machining state of the part.Finally,the DPPO algorithm can be used to solve the process route planning problem when the dynamic change of machining environment and the reconfiguration of part features occur simultaneously,and improve the response time of dynamic planning of process routes.Simulation experiments prove that the proposed method effectively solves the process route planning problem when dynamic changes of machining environment and part feature reconstruction occur simultaneously. |