Research On Rare Value Prediction For Imbalanced Regression

Posted on:2024-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:N Huang

Full Text:PDF

GTID:2530307112954079

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

One problem facing the real world is what to do with imbalanced data.When the target variable is a nominal variable,this is a classical imbalanced classification problem.When the target variable is a continuous variable,it is an imbalanced regression problem.In the imbalanced regression task,there are mainly three problems.First,the user’s goal is to predict rare values,and the number of rare value samples is small.If traditional methods(such as least squares regression)are used to predict rare values,the prediction model error is large,and thus the prediction results are extremely inaccurate.Secondly,traditional evaluation metrics are not sufficient to measure the performance of the model.Finally,in the regression tasks,it is also difficult to accurately distinguish rare values.At this point,if traditional methods are used to deal with imbalanced regression tasks,unreasonable and even wrong conclusions may be obtained.This paper proposes a sample weighted oversampling method for imbalanced regression data sets.The main work and innovation include the following aspects:Firstly,an adaptive correlation function is proposed which not only correctly identifies rare values,but also avoids missing rare values.Secondly,a new oversampling algorithm is proposed,which determines the class of the target variable and the weight of a single rare-value sample sampling according to the scarcity of the target variable,and then determines the number of individual rare samples sampled,and finally achieves the purpose of data balancing.Thirdly,a new algorithm is proposed to overcome the problem of covariance in oversampling.The basic idea is to use an improved method to avoid confining the new synthetic sample between the seed sample and its immediate neighbors,which will weaken the covariance between the predictor variables,thus improving the prediction accuracy of rare values and being more realistic to the real domain.To validate the prediction performance of the methods,15 real and 7 simulated datasets were used to evaluate the model performance of the two algorithms using two metrics,recall and precision,in the imbalanced regression task.The experimental results show that the two algorithms can improve the accuracy of rare value prediction.

Keywords/Search Tags:

Rarely value, Imbalanced regression, Oversampling, Multicollinearity, Adaptive weighting

PDF Full Text Request

Related items

1	The Diagnosis And Process Solutions In Multicollinearity Of Multiple Regression Model
2	Multicollinearity In Multilinear Regression Models And Partial Least Squares Regression
3	Data Based Oversampling In Imbalanced Data Classification
4	Dynamic Time Warping Oversampling Methods For Imbalanced Time Series
5	Impact of multicollinearity on small sample hydrologic regional regression models
6	Study On Multicollinearity In Linear Regression Model
7	Adaptive Weighting for Flexible Estimation in Nonparametric Regression Models
8	The Countermeasures And Several Improvements Of Multicollinearity In Linear Regression Model
9	Statistical Inference And Application For Rubin Causal Models And Regression Models With Some Types Of Data
10	Statistical Analysis Of Massive Imbalanced Data With Multiclass Logistic Regression