| The antenna reflection surface in geostationary orbit is deformed due to the complex thermal environment in space,which seriously affects the pointing accuracy of the antenna.In this paper,the active control method for adjusting the shape of the panel is studied.A piezo-ceramic actuator array is installed under the back plane of the antenna reflection surface.Based on the mathematical model,the problem of model uncertainty,model change and multiple working conditions are considered,and several learning control algorithms have been designed,by simulations the high-precision control of the antenna surface is achieved.The following is the specific design of the controller: 1.For the system with uncertain model disturbance,taking into account the limitations of the traditional LQR controller on the fault tolerance of the model,this paper proposes a reinforcement learning method to return the rewards of the action through trial and error.It divides the disturbance into discrete state space,and continuously fits the reference model according to a certain strategy.The value function is obtained by continuously reducing the error between the actual displacement and the reference displacement,and then the value of the current disturbance error is obtained by reusing the value function,and finally accurately corrects disturbance errors.2.Considering the influence of the changing model when the antenna is operating in orbit,which affects the state transfer relationship in reinforcement learning,a reference model Q learning algorithm based on the RBF fuzzy neural network is proposed to implement the dynamic migration of the value function experience.In this algorithm,fuzzy inference builds the overall framework by transferring precise input into fuzzy output,reinforcement learning explores the logical relationship between fuzzy input and output,the RBF neural network adjusts the fuzzy parameters and help to obtain an adaptive dynamic mapping function.The algorithm obviously improves the robustness and can satisfy the variation of the disturbance within a certain range.3.In the multi-condition temperature fields,the value function for shape regulation cannot apply to all of them,therefore,the passage designs three ways to optimize the experience of exploration strategies.One method is to expand the breadth of its experience coverage by increasing the types of operating conditions in the process of training.Another method is to hierarchically divide the input conditions according to their controllability,and perform strategy iteration separately.In the third method,Boltzmann strategy is used to replace the ε-greedy strategy to balance the probability of exploration and utilization strategy,which expanding the depth of exploration.By simulation,these three methods can adjust the temperature field deformation at various times under typical conditions. |