| In our daily lives,there are always a lot of unbalanced data problems,but the classification effect of such data sets is often not ideal.How to improve the classification effect of unbalanced data sets is a popular research topic for scholars.In this paper,the classification algorithm of unbalanced data sets is studied.Firstly,the algorithm ideas and processes of several single classifiers and several integrated learning classifiers are introduced.Secondly,the advantages and disadvantages of the SMOTE algorithm are introduced and analyzed.As a classic oversampling algorithm,the SMOTE algorithm can effectively avoid data over-fitting problems,thus improving the generalization ability of the model.However,the SMOTE algorithm does not take into account the distribution of data and the impact of most classes on a few classes,and does not take into account the specificity of the sample points.Some existing improved algorithms have improved the SMOTE algorithm from some aspects,which improves the classification efficiency to some extent,but there are some other problems.In this paper,the advantages and disadvantages of SMOTE and its improved algorithm are analyzed in detail.In order to further improve the classification effect of unbalanced data set,a DC-SMOTE algorithm based on Euclidean distance ratio is proposed.The DC-SMOTE algorithm can not only avoid over-fitting problems,but also considers the distribution of data sample points,and assigns different coefficient values to sample points at different positions,so as to perform interpolation operations specifically.Compared with the SMOTE algorithm,this algorithm improves the quality of the synthesized new sample points and improves the disadvantages of the SMOTE algorithm to some extent.Finally,this paper selects five unbalanced data sets of UCI and the real data set of the broker model of the company project.Four classical single classifier algorithms and four classical integrated learning algorithms are selected for SMOTE,Borderline-SMOTE,Kmeans-SMOTE and DC-SMOTE algorithms were compared and analyzed from F-values,G-mean,and AUC values.The experiment proves that the DC-SMOTE algorithm proposed in this paper can effectively improve the classification ability of the classification algorithm model and has effectiveness.At the same time,the algorithm was applied to the company's broker model project,and the classification also achieved good results,and finally succeeded in the company's success,which also proved that the research of this paper has high research significance and practical value. |