| In the real world, there exists a kind of production line based on productionstations and the conveyor conveys workpieces to these stations. Therefore, such akind of system is called Conveyor-Serviced Production Station. In themulti-procedure CSPS system, the production line consists of more than oneprocedure composed by multiple general stations. Between upstream procedure anddownstream procedure, there is equipped with one flexible station which are usedfor switching between the two procedures aimed to regulate operating loads ofdifferent procedure and improve the productivity of the system. For the optimizedcontrol problem in the multi-procedure CSPS system, the general station only doesthe coordinated control of Look-ahead while the flexible station divides thedecision control into two layers. One is the procedure switching decision of the uplayer; the other is the Look-ahead decision control of the down layer. The goal ofthe optimization is to maximize the entire system’s processing rate during theinfinite time by choosing reasonable switching control policy of the up layer andLook-ahead control policy of the down layer.During the switching control process,the state space of the flexible station ishuge. In the light of this character, this thesis first introduces Cerebella ModelArticulation Controller(CMAC) neutral network to approximate the value ofstate-action and propose an uniform Neuro-dynamic programming algorithm witheither discounted or average performance criteria to solve the coordinated problemamong different procedures. Then focusing on the Look-ahead coordinated controlproblem, a kind of multi-agent reinforcement learning method based on localinformation interaction is adopted to solve the coordinated problem in everyprocedure. The final simulation results show that, the proposed method which isbased on hierarchical coordinated optimization of the CMAC network has theadvantages of small storage space, high optimization accuracy and fastoptimization speed.In addition, the state variable of the up layer includes multiple elements, andthe state space is very complex. As a result, the system suffers from the curse of dimensionality. Firstly, we adopt state aggregation method to reduce the state space.Then RBF network is introduced to approximate the value of state-action in orderto solve the coordinated problem among different procedures. The simulationexperiment results demonstrate that, compared to employing RBF network directly,the scheme which uses RBF network after state aggregation to approximate thevalue of state-action has the advantages of high optimization accuracy and fastoptimization speed. |