| As an important production mode in process industry,batch process is widely used in various processing industries because of its flexibility and high efficiency.With the transformation and improvement of consumption demand,enterprises have increasingly strict requirements for batch production mode.As an effective means to improve product quality and production efficiency,the operation optimization of batch process has been paid more and more attention.For the new batch process just put into production,the modeling and optimization effect is not good because of the insufficient modeling data.Although the emergence of transfer learning can assist the establishment of process model of target domain with the help of source domain information,it needs the help of similar source domain information.On the other hand,self-transfer learning can improve the ability to solve the problem of the target domain by transferring the information obtained from different perspectives of the target domain without considering the information of the source domain,so as to improve the modeling effect of the new batch process.In order to reduce the experimental cost and improve the optimization performance,a hierarchical optimization framework composed of multiple optimization levels was adopted to optimize the production of the new batch process.The problems of model quality prediction,slow convergence speed and modelplant mismatch in production optimization are studied respectively.The main research contents are as follows:(1)Aiming at the problem that it is difficult to model due to the lack of production data under the hierarchical optimization framework,a hierarchical optimization method for batch process based on self-transfer model is proposed.Hierarchical optimization is mainly composed of three parts: upper optimization,lower optimization and feedback optimization.The upper optimization aims to quickly improve the operational trajectory to the suboptimal level through preliminary optimization,so the accumulation of production data in the upper optimization process is limited.In the lower optimization,the production data accumulated by the upper optimization was selected as the lowerlevel modeling data set,and the self-transfer model of the process was constructed using Partial Least Squares(PLS)and Support Vector Regression(SVR).The purpose is to improve the utilization rate of upper layer data by online self-transfer PLS and SVR process information obtained from different perspectives through weight updating scheme,so as to improve the modeling effect under the condition of insufficient modeling data.At the same time,the feedback optimization strategy is introduced to ensure that the lower optimization can implement efficient optimization on the initial operation trajectory obtained by the upper optimization.Finally,the synthesis process of cobalt oxalate was used to verify the effectiveness of the proposed method.(2)Aiming at the problem of slow convergence speed of lower optimization,a batch-to-batch optimization method based on deep reinforcement learning is proposed in the lower framework.When optimizing the process with high production cost,special attention should be paid to the optimization efficiency to better improve the economic benefits of the enterprise.To this end,the method of within-batch training and batchto-batch optimization is adopted,and the gap period of within-batch production is used to train the deep reinforcement learning optimization system.After a batch of production is completed,the optimization system with better training of adjacent batches is selected to optimize the actual production process,which avoids the optimization system with poor training for optimization.The process transfer model is used to replace the actual object,and the model is updated after the end of each batch of production,so as to improve the optimization efficiency of the optimization system.Finally,the effectiveness of the proposed method was verified by the cobalt oxalate synthesis process.(3)Aiming at the problem of within-batch model-plant mismatch in the lower optimization,an optimization method of deep reinforcement learning based on improved training environment is proposed in the lower framework.When the process transfer model replaces the actual process and interacts with the optimization system,the deep reinforcement learning network obtained by the final training may not be optimal due to adverse factors such as model uncertainty and noise.In order to make the model effectively guide the optimization system training within the batch and avoid the adverse effects caused by model mismatch training,the first-order Taylor formula approximation method is used to calculate the zero-order deviation term and first-order deviation term between the model and the actual process at the beginning of each batch of production,and the correct reward value for guiding the optimization system training is obtained through deviation compensation,so as to improve the training effect and further improve the optimization performance.Finally,the effectiveness of the proposed method was verified by the cobalt oxalate synthesis process. |