Data imbalance is prevalent in medical datasets.Standard classifiers in machine learning assume classification based on a balanced class distribution,so they perform poorly in imbalanced data classification tasks.In the medical field,misclassification of minority class samples often causes more damage than misclassification of majority class samples,so the problem of imbalanced data classification is particularly important in the medical field.In this dissertation,we propose new imbalanced data processing methods from three different perspectives:combination of resampling and single classifier,combination of resampling and ensemble learning,and cost-sensitive learning without using resampling,and apply them to medical-aided decision making to improve decision support and provide theoretical and technical support to medicalaided decision making for imbalanced data.The main work and innovation points are as follows.First,in terms of the hybrid approach of resampling and single classifier,an improved SMOTE with adaptive SVM hybrid technique for imbalanced data classification is proposed.In the preprocessing stage,for the problem that SMOTE cannot identify data noise,a noise filter based on ensemble learning is proposed to clean up the noise samples,which effectively reduces the risk of incorrectly setting the noise threshold and improves the sampling effect.In the classification stage,for the problem that the classification performance of SVM is easily affected by parameter settings,an adaptive SVM optimized based on the fuzzy self-tuning particle swarm algorithm is proposed.Finally,the hybrid method is applied to the survival prediction after lung cancer surgery,and the prediction accuracy is improved.Second,in terms of the combined approach of resampling and ensemble learning,a hybrid sampling and improved dynamic ensemble selection combination of imbalanced data classification method is proposed.First,SMOTE-ENN is used to preprocess the data to balance the data distribution and clean up the noise samples.Then,a candidate classifier generation method with a mixture of multiple clustering and Bagging is proposed to address the problem of insufficient local ability and diversity of candidate classifiers generated in existing dynamic ensemble selection techniques,which effectively improves the local ability and diversity of candidate classifiers.Finally,the combined method is applied to detect COVID-19 from routine blood data,which improved the detection accuracy and provided a new methodological idea for computer-aided diagnosis of COVID-19.Third,in terms of cost-sensitive learning techniques without using resampling preprocessing,a self-organizing cost-sensitive RBF neural network is proposed for the problem that misclassification cost is not easy to set and standard RBF neural networks do not have cost sensitivity.The method adopts the imbalance rate as the misclassification value for minority samples,and optimizes the structure and parameters of the RBF neural network using GA and improved PSO,respectively,to achieve self-organized optimization of cost-sensitive RBF neural networks under dynamic structure.Finally,the proposed algorithm is validated on several medical datasets,such as breast cancer and diabetes,to improve the performance of paramedical diagnosis. |