| The intelligent construction of fully mechanized caving mining is an important link to promote the transformation and upgrading of my country’s coal industry.At present,my country’s fully mechanized caving mining still relies on manual coal caving,and the intelligent coal caving technology restricts the intelligent development of fully mechanized caving mining.Hydraulic support is one of the most important electrical equipment in fully mechanized top coal caving face,and its control system design is the core to realize the intellectualization of fully mechanized top coal caving mining.In the intelligent construction of fully mechanized caving mining,it is generally required to establish an intelligent decision-making model that can make decisions on the actions to be taken at the coal caving port according to the real-time environmental status information of the fully mechanized caving face obtained by the intelligent perception system,so that the action of the coal caving port can be adjusted according to the environment.The state changes and automatically adjusts to improve the effect of coal caving.Therefor,it is necessary to learn the mapping relationship between the environmental state of top coal caving and the action of the support coal caving port,in order to realize the optimal decision of the coal caving port action.In the process of top coal caving,the control process of the support coal caving port is a time series,and its decision depends on the current coal caving state information of the fully mechanized caving face and the results of the previous coal caving,which has Markov characteristics.Hence,the decision-making process of the support caving port in the top coal caving process is a typical Markov decision-making process,and reinforcement learning can be used to solve the optimal decision-making of the coal caving port,in order to achieve the purpose of improving the coal caving effect.Therefore,this thesis uses Q-learning and DQN algorithms to establish an intelligent decision-making model for top coal caving in fully mechanized caving mining,optimize the coal caving process,and improve the coal caving effect of the fully mechanized caving face.The main contributions of the thesis are as follows.1)Base on the discrete element development environment YADE under Linux environment,this thesis develops a three-dimensional simulation system for continuous cutting coal caving in fully mechanized caving face.The system which sets the coal caving parameters based on the real geological environment of the coal mine can effectively simulate the swing of the tail beam,the pushing action of the hydraulic support,and the mixed dynamic change process of the coal and gangue on the top of the tail beam during the coal caving process.On this basis,the comparison experiment of single-wheel coal caving and double-wheel coal caving were carried out.The results show that the three-dimensional model of cutting coal caving can truly simulate the top coal caving process.It also provides a new method for studying the top coal caving law of fully mechanized caving mining from three-dimensional perspective;Under the condition of continuous cutting coal caving in fully mechanized caving face,the effect of double-wheel sublevel interval caving is the best,the average top coal recovery rate is 86.64% and the gangue rate of 4.06%.2)This thesis extracts the actions of the top coal caving as a Markov decision process by the spatial layout of the hydraulic supports and the characteristics of the windows action.Meanwhile,the reinforcement framework learning is employed to determine the optimal action of windows when top coal caving,in which the Q-learning algorithm is adopted online to learn the mapping between the state of top coal and the action of the windows without preparing huge training samples.In the methodology,a new reward function based on mean deviation is designed for Q-learning to maintain the coal-rock boundary settlement uniformly when top coal caving.In the top coal caving dynamic process,the agents are guided by the reward function to learn how to control the shape of the coal-rock boundary,therefore the action coordination of the agents is reinforced to improve the effectiveness of the top-coal caving.In addition,this thesis proposes a multi-experience pool storage and extraction method to improve the learning efficiency of the agent.The experiment results show that the coal-rock boundary driven by our method is flatter during the coal falling,and the average reward of the agent for top coal caving can reach 13467.8.The reward of our method is 8.8% higher than the Q-learning method and 10% higher than the single-round sequential coal caving process.3)Because the Q-learning algorithm is used to learn the control strategy of the coal caving hole,the obtained top coal environmental state must be discretized,which brings about the problem that the control accuracy of the coal caving mouth is reduced.Therefore,this thesis uses the DQN algorithm to learn the control strategy of coal caving,and transforms the mean deviation reward function into a continuous expression form suitable for the DQN algorithm.In addition,this paper proposes a method for storing and extracting experience from multiple experience pools,which improves the learning efficiency of the agent.The simulation results on the three-dimensional simulation experiment platform of feeding coal caving in the fully mechanized caving face show that the intelligent control model of top coal caving based on the DQN algorithm can continue to improve the coal caving effect.The thesis includes 27 figures,17 tables and 76 reference... |