| Spacecraft shell parts are the foundation of the spacecraft manufacturing field.Completing the production tasks of various shell parts on time is an inevitable requirement for guaranteeing national defense security.The machining and production of spacecraft shell parts have the characteristics of a short production cycle,fast product update,and frequent order demand changes.The random arrival of emergency orders has become inevitable.Therefore,this thesis selects a deep reinforcement learning method that uses real-time data to drive production decisions to dynamically process the machining production process of spacecraft shell parts in the case of emergency order insertion.Emergency order insertion is a thorny issue.The existing emergency order priority evaluation indicators may fail to face the production of spacecraft shell parts.This is because its production mode is usually for the same customer,and the shell parts orders show the characteristics of a single-piece and small-batch.Priority evaluation indicators such as customer importance,unit order cost and revenue,product quality requirements,etc.,are consistent for each new order,which makes it difficult to distinguish their respective priorities.Therefore,it is necessary to propose a new priority index to meet the requirements for reasonable insertion of emergency orders during the machining and production of spacecraft shell parts.The re-scheduling problem of aerospace single-piece and small-batch production is a typical job shop scheduling problem.However,the existing deep reinforcement learning methods to deal with job shop scheduling problems lack the research on reinforcement learning modeling methods.Unreasonable reinforcement learning agent object selection and agent action output settings will cause the algorithm to be unstable and difficult to converge.Therefore,considering the advantages of single-agent setting up in stability and easy modeling,this thesis transforms the job shop scheduling problem into a Markov decision process under a single-agent setting.At the same time,most general reinforcement learning scheduling methods set heuristic scheduling rules as the action output of the agent.These scheduling rules are designed for specific problems and require the rich experience scheduling experts paying a large amount of code work.Therefore,this article proposes a Double Loop deep Q network scheduling method not based on scheduling rules.The Experiments show that this method is versatile and can obtain a better scheduling solution than the comparison method.Then,this article discusses the progress of the re-scheduling of aerospace singlepiece small-batch production under the emergency order arrival.Aiming at the problem that the existing emergency order priority evaluation indicators are prone to failure,this thesis proposes an emergency redundancy indicator on the ratio between the delivery deadline of the emergency order and the total planned processing time,and select the appropriate strategy from 5 newly designed emergency order insertion strategies under different redundancy values,which is closely integrated with the aforementioned improved reinforcement learning scheduling method to generate a re-scheduling plan for the total order.The experiments under different redundancy values and mixed redundancy values verify the re-scheduling.Experiments under different redundancy values and mixed redundancy values verify the effectiveness of the re-scheduling scheme for aerospace single-piece small-batch production re-scheduling under emergency orders.Finally,a Unity 3D production scheduling simulation system for the machining process of aerospace shell parts was built,and the re-scheduling plan in the case of emergency order insertion was displayed in three dimensions. |