Font Size: a A A

Selective Ensemble Model And Its Application In Diabetes Prediction

Posted on:2021-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2494306458990809Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,ensemble learning has become a hotspot in the research of machine learning in medical data prediction.As an extension of integration learning,selective integration can reduce integration scale while maintaining high prediction accuracy.The key to the study of selective integration lies in its selection strategy,but the previous static selection did not fully consider the difference of samples to be tested.Therefore,this paper designs and establishes a dynamic selective ensemble prediction model to select the best set of base learners for each sample to be tested,so as to improve the prediction accuracy of regression and classification.Diabetes has become one of the chronic diseases threatening human health.The latest survey shows that China has the largest number of diabetes patients in the world,with a total number of 116 million.In addition,there are some patients whose quality of life is seriously affected by the lack of early treatment.Therefore,the selective ensemble model is used to realize the regression and classification prediction of diabetes,which provides a basis for early screening and prevention of diabetes.The main research content of this paper includes the following three aspects:(1)A new nearest neighbor similarity measure based on feature importance weighting is proposed.Due to the differences in the prediction performance of the learner for different samples to be tested,this paper evaluates the prediction accuracy of the learner by using the nearest neighbor samples of the samples to be tested.However,the existing nearest neighbor similarity measurement usually adopts the Euclidean distance method,which tends to lack attention to the importance of sample features.Based on this,this paper proposes a sample nearest neighbor similarity measure based on feature importance weighting by utilizing the advantages of random forest in evaluating feature importance,such as strong interpretability and less parameter adjustment.Experimental results show that the similarity measurement method improves the prediction accuracy of regression and classification.(2)A model called DSEP-KNNPAE(Dynamic Selective Ensemble Prediction Model Based on K-Nearest Neighbors Prediction Accuracy Evaluation)was designed and established.In this model,the nearest neighbor similarity measurement method based on the weight of feature importance is used to find the best nearest neighbor samples of the samples to be tested.The comparison experiment of different algorithms and the parameter sensitivity analysis experiment verify that the model established in this paper is superior to the existing integrated learning algorithm in the prediction accuracy of regression and classification.(3)DSEP-KNNPAE model was applied to predict the regression of blood glucose in diabetes and genetic risk classification of gestational diabetes,which was used to assist decision-making in early screening.Compared with the existing integrated learning algorithm,DSEP-KNNPAE has higher prediction accuracy in the prediction of diabetes,which effectively improves the screening effect of diabetes.
Keywords/Search Tags:Dynamic selective ensemble, Nearest neighbor similarity measurement, Feature importance weighting, Diabetes prediction
PDF Full Text Request
Related items