| Class imbalance learning is an important branch of data mining and machine learning,and has a wide range of applications in many fields such as financial markets,government services,medical diagnosis,and scientific research.The effect of traditional classification algorithms designed for balanced data is often unsatisfactory in practical applications,because the data provided by real-world problems are often unbalanced,such as credit card fraud,cancer cell diagnosis and face recognition and other fields.When the existing traditional classification algorithms are dealing with these unbalanced data,due to the large disparity between the number of samples in the majority class and the number of samples in the minority class,the classifier will bias the samples to be classified into the majority class,resulting in a decrease in the classification accuracy of the minority class samples.Then,in practical problems,the correct classification of minority class samples is very important,because the consequences of misclassification of minority class samples are more serious.How to improve the classification accuracy of traditional classifiers for imbalanced data has become a research hotspot among scholars.Ensemble learning is favored by scholars because it can significantly improve the accuracy of classification by fusing multiple base classifiers,but it may still have the problem of poor generalization ability when dealing with highly imbalanced data.Based on this,this thesis proposes two models to improve the generalization ability of ensemble learning.The main research contents and innovative achievements are as follows:(1)In order to solve the problem of highly unbalanced data classification,this thesis proposes an adaptive selective balance ensemble model(ASBE)based on distance fusion rules.Subsets are processed using a hybrid sampling method based on SMOTE and Rk NN,and then base classifiers are trained on these processed datasets,using distance-based integration rules to improve the overall output by adjusting the probabilities generated by sub-classifiers.In order to verify the generalization performance of the integrated model,40 sets of data with different degrees of imbalance were selected from the KEEL and UCI public databases for experiments.The experimental results show that the performance of ASBE is comparable to or even superior to the current common methods.(2)From a practical application perspective,an adaptive dynamic ensemble selection model(ADES)is proposed to solve the problems of stroke data screening and disease prediction in the medical field.ADES aims to address the characteristics of complex imbalanced data such as overlapping categories and small separation.The original data is split into multiple balanced subsets,and a base classifier is established on these balanced subsets that rarely contain such problems,Six different base classifiers are used to classify the processed balanced subsets through heterogeneous integration,and the base classifier with the highest AUC value on each subset is selected for fusion output.The experimental results on the imbalanced dataset collected from KEEL and UCI public databases show that the proposed algorithm outperforms similar methods. |