| One problem facing the real world is what to do with imbalanced data.When the target variable is a nominal variable,this is a classical imbalanced classification problem.When the target variable is a continuous variable,it is an imbalanced regression problem.In the imbalanced regression task,there are mainly three problems.First,the user’s goal is to predict rare values,and the number of rare value samples is small.If traditional methods(such as least squares regression)are used to predict rare values,the prediction model error is large,and thus the prediction results are extremely inaccurate.Secondly,traditional evaluation metrics are not sufficient to measure the performance of the model.Finally,in the regression tasks,it is also difficult to accurately distinguish rare values.At this point,if traditional methods are used to deal with imbalanced regression tasks,unreasonable and even wrong conclusions may be obtained.This paper proposes a sample weighted oversampling method for imbalanced regression data sets.The main work and innovation include the following aspects:Firstly,an adaptive correlation function is proposed which not only correctly identifies rare values,but also avoids missing rare values.Secondly,a new oversampling algorithm is proposed,which determines the class of the target variable and the weight of a single rare-value sample sampling according to the scarcity of the target variable,and then determines the number of individual rare samples sampled,and finally achieves the purpose of data balancing.Thirdly,a new algorithm is proposed to overcome the problem of covariance in oversampling.The basic idea is to use an improved method to avoid confining the new synthetic sample between the seed sample and its immediate neighbors,which will weaken the covariance between the predictor variables,thus improving the prediction accuracy of rare values and being more realistic to the real domain.To validate the prediction performance of the methods,15 real and 7 simulated datasets were used to evaluate the model performance of the two algorithms using two metrics,recall and precision,in the imbalanced regression task.The experimental results show that the two algorithms can improve the accuracy of rare value prediction. |