| With the rapid development of artificial intelligence machine learning,deep learning,artificial intelligence and other fields,the data generated at the same time is also increasing,and the data in various fields is constantly improved,while the data set existing in nature is generally unbalanced.Unbalanced data sets are fatal to machine learning algorithms.At best,the model’s performance will plummet;at worst,the model will collapse directly.There is still a huge room for development in the field of data set imbalance research.In recent years,Generative Adversarial Network(GAN)has emerged and made great contributions in the field of image and medicine,such as style conversion,face generation,animation image generation and so on.In this paper,a method of classifying imbalanced datasets based on GAN is proposed.GAN consists of a generator and a discriminator.function of generator is to fit the distribution of input real data as much as possible.The discriminator try to judge whether the samples are from the generator or real data.The competing and promoting each other until Nash equilibrium is achieved.The powerful generation ability of GAN can be used to expand a few samples in imbalanced datasets.In this paper,GAN adversation neural network model is introduced firstly.Its generator single-layer perceptron,multi-layer perceptron,forward propagation,back propagation,convolution layer,pooling layer,activation function and other methods.However,there is still a big difference between the generated data and the original data,which leads to the performance degradation of the model,such as the accuracy,recall and precision.Therefore,this paper makes the following improvements to the model: 1)the loss of ability is used as the loss function of the model;2)On the basis of ensuring the energy loss and adding the KL divergence function ensures that the generated data is similar to the original data distribution;3)Noise data is used in the generator,and denoising function is incorporated to further improve the objective function.Logistic regression algorithm,support vector machine algorithm,Ada Boost algorithm are all commonly used algorithm for machine learning,using the three algorithms to modeling of original imbalance data sets,GAN equalization after the improved data modeling compares the model accuracy rate,recall rate and precision rate,F1 score and other indicators,and get the following conclusions:(1)The accuracy rate,recall rate,precision rate and F1 value of the data set equalized by the improved GAN model have been improved to a certain extent,especially the recall rate has been improved significantly.The use of the improved GAN model can effectively alleviate the impact of unbalanced data set on the performance of machine learning algorithms.(2)The data set equalized by the improved GAN model will degrade the performance of the model in some data or algorithms,which is the new samples generated by the improved GAN model still have some noise.(3)Using four data sets,three machine learning models and four model evaluation indexes,it is shown that the improved GAN model can effectively alleviate the impact of unbalanced data sets on machine learning performance. |