Font Size: a A A

The Study Of Data-Based Adp Offling Value Iteratio Algorithm And Online Q Learning Algorithm

Posted on:2013-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhouFull Text:PDF
GTID:2230330374998192Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Adaptive Dynamic Programming (ADP) is an effective way in solving nonline system optimal control problem.As the internal dynamic characteristic of the controlled system is known, value iteration(VI) algorithm and policy iteration (PI) algorithm can be both used to solve the optimal control problem.Combining the Data-based control theory and Adaptive Dynamic Programming, study Data-based Adaptive Dynamic Programming,it can be used to solve the optimal contol problem of the nonline system when the system’s internal dynamic characteristic is unknown.Data-based Adaptive Dynamic Programming off-line algorithm solve the optimal control law through the acquisition of historical data of the system and need to build the system’s model. Although, the off-line data can more comprehensively reflect the internal dynamics of the system and Data-based Adaptive Dynamic Programming off-line algorithm can obtain relatively global optimal control solution;the update time of the off-line data is longer than the on-line data and Data-based Adaptive Dynamic Programming off-line algorithm runs slowly,more over,as it need to build the system’s model, the error of the model and the uncertainty of the system often makes the adaptive capbality of algorithm inferior;Data-based Adaptive Dynamic Programming on-line algorithm use the on-line data to get the optimal control law,although,it runs fastly and the adaptive capbality of algorithm is superor,it is easy to fall into local optimum.In view of the above problems,according to the advantages and disadvantages of the one-line and the off-line adaptive dynamic programming,the paper mainly studied the data-based ADP and proposed a new adaptive optimal control method:using off-line data establish the neural network model of the system fistly,then using off-line value iteration algorithm of ADP get an optimized control law secondly,at last,using on-line policy iteration of Q learning improve the optimized control law.this can make full use of the advantages of the Data-based Adaptive Dynamic Programming off-line algorithm and the Data-based Adaptive Dynamic Programming on-line algorithm.Cane sugar production is a complex industrial process,it has much characteristics including nonlinear、multiple input、dynamic continuity、 uncertainty and so on. When the traditional control theory which is based on the mathematical model of system is used to control the pH value of cane sugar manufacture process,the control effect is not ideal,due to the the difficulty in establishing accurate mathematical model.In this paper,we built the sistem’s neural network model using the sample data collected from sugar factory site in real time and then used the proposed algorithm to control neutralization pH of the clarify process and achieved a better control result,it verifed the validity of the proposed method in this paper.
Keywords/Search Tags:Adaptive Dynamic Programming, On-line traning, Off-linetraning, Policy Itertion, Value Iteration, pH value optimize control
PDF Full Text Request
Related items