Font Size: a A A

Research On Key Techniques Of Clinical Assisted Decision-Making For Structured Data

Posted on:2023-06-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LuoFull Text:PDF
GTID:1524306914477744Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clinical decision-making refers to the process in which doctors draw the optimal diagnosis conclusion and give the treatment scheme by integrating a large number of complex clinical data information.And this process is largely affected by the theoretical basis,clinical experience and value orientation of the decision-maker.At present,machine learning algorithms have been widely applied in the field of clinical medicine,but the fusion of machine learning algorithms and medical data is greatly limited due to the uneven degree and quality of data standardization,weak interpretability of models,models too complex to migrate and other reasons.In order to quickly extract effective information from a large amount of medical data,and improve the quality and efficiency of doctors’work in the clinical decision-making process of disease assessment,diagnosis,and treatment,this article uses medical structured data to focus on the key technologies involved in data preprocessing,model interpretation and feature selection in the process of establishing clinical assisted decision models.Aiming at different clinical problems,three clinical decision support models with strong generalization ability were established:missing data reconstruction prediction model based on generative adversation network,TreeSHAP-based interpretable prediction model and transferable GainLasso-XGBoost interpretable prediction framework.The main work and innovations are as follows:1.The prognostic prediction model based on generative confrontation network is studied.In this paper,through comparative experiments after missing value filling,it is found that the generative adversarial network algorithm can more effectively resist the impact of high missing rate,and proposes a missing value reconstruction prediction model based on generative adversarial network,which proves for the first time that the generative adversarial network algorithm has the advantage over other missing value filling methods in processing medical structured data,which solves the problems of slow calculation speed and poor generalization ability of most missing value filling algorithms at present.The model has been validated on the tumor follow-up database designed and developed by the National Cancer Institute of the United States.The results show that the generative adversarial network algorithm can quickly and effectively perform missing value reconstruction and filling,and the generated data set is closer to the true distribution than other methods.The five-year survival prognosis prediction model for colorectal cancer established on this basis has strong generalization ability,and the area under the receiver operating characteristic curve is as high as 0.848,which can effectively assist clinicians to evaluate the disease progression of colorectal cancer patients and determine the optimal treatment plan.Finally,the random forest algorithm was used to arrange and analyze the importance of features of the feature vectors used in the model,so that clinicians can understand the correlation between each feature vector and the target variable,and excavate potential factors that affect the survival time of patients,and provide evidence support to assist them in making diagnosis and treatment decisions.2.Methods to make disease severity prediction models interpretable were studied.In this paper,the contribution of each feature vector to the prediction model is quantified by TreeSHAP method,and an interpretable model based on SHAP method is proposed.The model can analyze the positive/negative relationship between each feature vector and the target variable,and clarify the influence of each feature on the prediction results.The performance of the model was better than that of the Apache Ⅱ disease severity assessment scale(the area under the receiver operation characteristic curve was 0.76 and 0.69,respectively),which solved the problems of the traditional Apache Ⅱ scale such as difficult comprehension,complex calculation and poor repeatability.In order to further improve the accuracy and stability of the model,this paper optimizes the prediction model,and the final model accuracy and the area under the receiver operating characteristic curve are increased to 0.87 and 0.81,respectively.The TreeSHAP method can eliminate the decision-making risk of machine learning models in practical applications to a certain extent,solve the problem of weak interpretability of the current clinical assisted decision-making model,increase the transparency and credibility of the model,and enable medical workers to assess the severity of disease in critically ill patients more accurately and timely.Finally,this paper verifies the model by extracting a data set from another completely independent multi-center ICU database,which proves that the model has good generalization ability(the area under the receiver operating characteristic curve is 0.79).3.The interpretable predictive model based on Lasso regression is studied.In this paper,we first attempt to simplify the feature variables in the training data set by using Lasso feature selection method,and improve the model accuracy by using generative adversative network algorithm to fill the missing values,and then propose a simple,generalized,explainable and portable prediction model based on gain-Lasso-XGBoost.In order to verify the validity of the model,this paper uses the whole disease data from the intensive care medical information database,and finally establishes the early prediction model of acute kidney injury through the data set that composed of eight eigenvectors selected.Compared with previous studies,the prediction model proposed in this paper not only simplifies the model but also significantly improves the performance of the model(the areas under the receiver operation characteristic curves are 0.849 and 0.830,respectively).This model can make early prediction of the risk of acute kidney injury for all critically ill adult patients after entering the intensive care unit,and solves the problem that the extraction of feature variables in most current medical prediction models is mainly performed manually by the domain knowledge of medical workers.Moreover,because the model is simple,easier to interpret,and easy to obtain characteristic variables,it is more clinically practical,which is helpful for assisting ICU medical workers to identify patients with high-risk acute kidney injury early and avoid delaying the treatment of patients.Finally,in order to further verify the effectiveness of the Gain-LassoXGBoost clinical assistance decision-making framework,and solve the problems that most of the current clinical assistance decision-making models are weak in generalization and difficult to migrate,the framework was verified on the Surveillance,Epidemiology,and End Results public database,and the results show that after reducing 3 variables,the area under the receiver operating characteristic curve can still be increased from 0.844 to 0.846,which proves that the framework can effectively improve model performance while simplifying the model.In conclusion,from the perspective of data preprocessing,feature selection and model interpretability,this paper has made effective progress in the research of key technologies for clinical decision making based on medical structured data.
Keywords/Search Tags:Machine learning, Clinical assisted decision making, Feature extraction, Model interpretation, Data preprocessing
PDF Full Text Request
Related items