| Job shop scheduling is the core subject for manufacturing enterprises to reduce costs,improve quality and efficiency,and has always been a research hotspot.With the continuous breakthrough and development of manufacturing informatization construction,a large amount of data will be accumulated in the production process of the workshop,which makes it difficult to obtain valuable scheduling information such as workpiece working hours,equipment selection,and workpiece procedures hidden in the data,thus affecting the effectiveness of job shop scheduling decisions.Therefore,this thesis takes job shop scheduling problem as the research object,combines data mining technology with scheduling rules and genetic algorithm,and studies a new method of job shop real-time scheduling based on data mining.In order to use data mining technology to analyze and mine the useful information in scheduling data,a scheduling framework based on data mining technology is constructed in this thesis.C4.5 decision tree algorithm in data mining is selected to obtain scheduling knowledge that determines the processing order of workpiece from offline data and generates C4.5 tree rule graph.A genetic optimization algorithm based on data mining and scheduling rules is proposed to obtain decision tree rules by using pessimistic error pruning algorithm and further optimize the rules with genetic algorithm(GA-DDR).In order to improve the efficiency of GA-DDR algorithm,this thesis further studies the problem of data preprocessing from the aspects of unbalanced data processing,and proposes a genetic algorithm of data mining and scheduling rules based on synthetic data(GA-SDDR).Based on the C4.5 multi-tree rule base,SMOTE algorithm,Borderline-SMOTE algorithm and comprehensive sampling algorithm are used to sample the unbalanced data to solve the problem of imbalance and small quantity of data.The data obtained by the above three sampling algorithms is respectively trained in CART decision tree algorithm,and new training data is selected according to the Precision,Recall and AUC evaluation indexes.Under the action of these new synthetic data,the proportional increase of limited data samples is realized and the category imbalance caused by manual tabulation is eliminated.In this thesis,a simulation environment is built on Python platform,and a simulation study is carried out according to the proposed data preprocessing method by using classic examples of LA07 and others.The results show that the proposed GA-SDDR algorithm has higher optimization speed and stronger capability compared with the algorithms in literature. |