In clinical practice,due to patients suffering from multiple long-term diseases that require continuous treatment with multiple drugs,simultaneous consultation by doctors with different expertise and disciplinary background is needed to make intervention decisions according to the time-varying states and characteristics of patients.Reinforcement learning has a large number of applications in continuous treatment decision-making,but owing to the complexity of medical problems,traditional reinforcement learning is difficult to solve the continuous treatment of multiple diseases efficiently.Cooperative multi-agent reinforcement learning has significant advantages in complex decision-making problems,which introduces multiple agents to complete team tasks by interaction and cooperation.Therefore,driven by medical data,this thesis designs two auxiliary decision-making models for the continuous treatment of multiple diseases.The models apply multi-agents based on reinforcement learning to simulate the multiple doctors’ consultation scenario,and make the agents cooperate to learn the medication policy.The clinicians can use the output strategy of the model to assist their medication decision.Specifically,the research contents and contributions are as follows:First of all,according to the equal cooperation of the consultants,this thesis proposes a multi-agent multi-disease continuous treatment decision-making model based on parallel cooperation.The model learns a joint action value function with a nonlinear representation,uses a monotonic constraint to ensure the consistency of individual decisions and team treatment strategy,and introduces additional global knowledge to provide more information for treatment decision,so as to produce team collaborative dosing policy.The model is conducive to overcoming the challenge of credit assignment among agents,realizing credit adaptation and improving the generalization ability of the model in complex continuous treatment decision-making tasks.Then,according to the hierarchical cooperation of the consultants,this thesis proposes a multi-agent multi-disease continuous treatment decision-making model based on hierarchical cooperation.The model uses hierarchical structure to learn a nested medication strategy and the agents make decisions on different time scales.The long-term treatment goal is subdivided into short-term decision-making goals through goal decomposition,and the cooperative team treatment policy is generated in the way of goaloriented learning.In order to balance the long-term and short-term treatment goals,an internal incentive mechanism is introduced to provide immediate feedback in the learning process.The model is conducive to overcoming the sparse reward in the long-term medical decision-making,not only can avoid the dimension explosion caused by overexploration,but also can be more reliable for explaining the dosing behavior of agents.Finally,for the continuous treatment of diabetes and kidney disease,this thesis uses the two multi-agent multi-disease continuous treatment decision-making models above to conduct the data modeling and model experiments based on reinforcement learning.The results show that the two models of this thesis show better treatment effects than the baselines and clinicians,which can not only reduce the mortality of patients to the lowest,but also maintain blood glucose well.In contrast,the decision-making model based on parallel cooperation has more significant therapeutic effects,while the decision-making model based on hierarchical cooperation has faster learning speed and higher exploration efficiency.This thesis establishes the multi-agent cooperation model for decision-making according to the interactive learning behavior of clinicians,which provides a new perspective to solve the inherent challenges of multi-agent reinforcement learning.It can not only provide auxiliary for clinical treatment decision-making of multiple diseases,reduce the rate of decision-making errors and improve the treatment effects,but also can expand to complex continuous decision-making problems. |