| With the improvement of living standards,one’s own health and well-being have attracted the attention of the public,and moxibustion has become the main force in health and well-being.As the first choice of modern people,when it is applied on a large scale,traditional moxibustion methods are prone to shortage of expert resources and timeconsuming problems;if modern moxibustion devices are used,there will be problems that moxibustion cannot be performed on acupoint,and the moxibustion device cannot properly adjust the distance between the moxa stick and the acupoint skin.In response to these issues,this thesis considers using robotic arms instead of experts to carry out research on moxibustion decision-making tasks based on imitation learning and reinforcement learning techniques,which mainly include the following three aspects:First,based on the problem analysis and research on moxibustion decision-making tasks,this thesis proposes to model the moxibustion decision-making problem as a Markov decision model.Through this model,the moxibustion decision-making process can be fully demonstrated.In the modeling process,this thesis defines the state,action and reward functions of moxibustion decision-making tasks,which are the key concepts in moxibustion decision-making tasks.Then,the goal research of moxibustion decision-making task was carried out,and the solution method of moxibustion decision-making model was proposed.In this thesis,by constructing teaching data samples,building Strategy Learning Network(SLN),learning teaching strategies,and using teaching strategies as prior knowledge of moxibustion by experts.In order to learn the optimal moxibustion strategy,this thesis combines the energy absorbed by the skin at acupoints with the prior knowledge of experts to construct a reward function based on imitation learning.Through the reward function constructed in this thesis,the agent is guided to conduct reinforcement learning.Finally,an experimental platform was built for the method proposed above,and experimental verification and error analysis were carried out.Based on imitation learning and a novel deep reinforcement learning algorithm,this thesis conducts experiments and analysis on real collected data sets,and establishes a reasonable evaluation metric,that is,the effectiveness of the strategy.In addition,different algorithms were selected for comparative experiments.It turns out that the result of the algorithm proposed in this thesis is the best,and the strategy effectiveness ratio reaches 78.5%.In order to accelerate the strengthening of learning and improve the performance of agent training,this thesis proposes a fusion model of XGBoost and LSTM in a simulated environment.Through comparative experiments,it is proved that the prediction result of the fusion model proposed in this thesis is optimal.It can be calculated by the evaluation metrics formula.The mean square error(MSE),mean absolute error(MAE),and mean absolute percentage error(MAPE)are 0.014,0.092,and 25.3%,respectively. |