| Unbalanced data set refers to the party with large difference in the number of samples in different categories,and the party with a small number often has more research value.Traditional classification algorithms pursue the overall accuracy of the model.When applied to unbalanced data sets,it is easy to misclassify a small number of samples with higher misclassification cost.Therefore,unbalanced dataset classification is one of the main difficulties in the field of data mining.The paper preprocesses the unbalanced data set with the improved RBO algorithm,and classifies the data with the improved SVM algorithm.The main research is as follows:(1)An improved RBO algorithm based on data cleaning is studied.Based on the RBO oversampling algorithm,the algorithm introduces data cleaning technology to delete the oversampling noise samples.Through the visual comparison of manual data set sampling,it is verified that the improved algorithm can improve the data quality of samples.Finally,the effectiveness of the algorithm is verified by statistical test analysis on KEEL dataset.(2)A SVM algorithm based on improved WOA optimization is studied.Aiming at the shortage that WOA algorithm is easy to precocious,SGO optimization algorithm is introduced to improve the optimization process of WOA.Then,the improved WOA algorithm is applied to the selection of SVM kernel function parameters and penalty factors.Finally,the classification effect of the algorithm is verified by simulation experiments on KEEL dataset(3)An integration algorithm based on RBO-ENN and SWOA-SVM is studied.Based on the Ada Boost integration framework,RBO-ENN algorithm is introduced to sample the training set,and SWOA-SVM algorithm is used as the weak classification algorithm.Simulation experiments on keel data set verify the effectiveness of the algorithm. |