| With the rapid development of big data technology and the acceleration of medical and health information construction,medical big data has gradually become a research hotspot.More and more research is using machine learning methods to medical data analysis to reduce the burden on doctors.In the medical field of,there is always a problem of sample imbalance,which leads to the unsatisfactory prediction effect of the model.Undersampling is a common method to solve the problem of sample imbalance.However,the existing undersampling technology cannot guarantee the precision and recall rate at the same time.Therefore,it is of great theoretical and practical significance to design a classification method for medical imbalanced samples.A new ensemble classification method based on undersampling with iteratively boosting(USIB)is proposed for imbalanced medical samples.First,this method undersamples majority class samples to construct a group of base classifiers.Then the base classifier is integrated into the classifier output of the last iteration to form a new set of ensemble classifiers.Then select the ensemble classifier with the best classification effect and compare it with the last output classifier.If the improved effect reaches the threshold value,stop the iteration.Otherwise,according to the classification effect of the output classifier on majority class samples this time,the sampling probability will be updated to enter the next iteration.Through the targeted sampling of majority class samples,the precision of prediction of the model is improved while the recall rate is guaranteed.The validity of the algorithm is verified on two imbalanced medical datasets.The experimental results show that the USIB method is superior to the existing undersampling algorithm in terms of multiple indexes on the highly imbalanced medical datasets.Compared with the other six undersampling methods,the F1 value of USIB method increased by 9.77% and 11.69% respectively compared with the best-performing method,and the AUC increased by 13.88% and 1.51% respectively compared with the best-performing method. |