Font Size: a A A

Study On Drug Recommendation Based On Improved Random Forest Model

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2544306935983789Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The scientific use of drugs is important to improve the therapeutic effect of diseases and drug safety.However,due to the complex etiology of chronic diseases,the response of different patients to drugs varies.Therefore,it is necessary to provide patients with personalized drug regimens that take into account their own characteristics.In recent years,machine learning methods have been widely used in the medical field.By analyzing the medical examination results and medication information of previous patients and finding the correspondence between them,it can make the drug treatment plan regular and provide a reference for doctors,especially junior doctors,to quickly develop personalized medication plans for patients.Based on machine learning as the theoretical foundation,this paper aims to achieve drug type classification prediction by constructing a drug classification prediction model based on machine learning methods based on visualization of medical data features and using the results of individual biochemical indexes of patients as the basic elements of the model to provide more in-depth support for personalized drug treatment for patients.The main research contents of the paper are as follows:First,the attributes of the dataset are visualized and pre-processing of the data is completed.Outliers are usually present in the dataset,which may affect the effectiveness of subsequent classifier models.Therefore,this paper proposes the use of an outlier detection algorithm based on isolated forests,by using which the outliers in the dataset can be effectively detected and removed.Then,through the visual analysis of attributes and finding the correspondence relationships existing between attributes,new classification attributes can be constructed on the basis of existing attributes,which can be used to improve the classification accuracy and provide data guarantee for subsequent modeling.Secondly,for the existence of unbalanced attributes in the medical dataset,the sampling algorithm is used to process them,with the aim of making the dataset attributes as balanced as possible and constructing a new balanced dataset.It avoids the situation that the prediction results are biased towards the majority class and ignore the minority class due to data imbalance.To address the problem that the traditional oversampling algorithm SMOTE is prone to marginalized data distribution,this paper proposes a KMeans-SMOTE algorithm based on the idea of clustering to balance the dataset.The method generates samples based on clustering distribution,which can effectively avoid generating new samples with blurred boundaries and also does not easily cause blurred boundaries of new samples,which can solve the drawbacks of SMOTE algorithm.In order to evaluate the impact of sampling algorithms on the model results,this paper conducts comparison experiments using three sampling algorithms separately.The results show that the KMeans-SMOTE algorithm can effectively improve the attribute class distribution of the dataset,which in turn balances the whole dataset and improves the accuracy of the model for drug classification prediction.Finally,traditional and integrated drug classification prediction models were constructed based on machine learning algorithms,respectively.In this paper,logistic regression,K-nearest neighbor,support vector machine,random forest,XGBoost and Balanced Random Forest algorithms are firstly used to model the classification prediction study of patient drug types.In order to further improve the classification accuracy of the model,this paper proposes an integrated learning drug classification prediction model based on voting idea,based on TPOT method and based on Stacking algorithm,using Random Forest model as the base model.Compared with the six traditional machine learning classification models constructed above,the three integrated models constructed in this paper can effectively improve the accuracy of patient drug type prediction,and the model performance is more stable,which has certain reference value in practical applications.
Keywords/Search Tags:Drug prediction, Multi-classification, Random forest, Sampling algorithm, KMeans-SMOTE, Ensemble learning
PDF Full Text Request
Related items