Font Size: a A A

The Coefficient Of Determination Corrected Based On Empirical Modeling

Posted on:2022-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2480306557464304Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Linear regression is one of the most widely used statistical models for modeling the relationship between dependent variables and independent variables.The coefficient of determination(R2)is used to measure how well the dependent variable is linearly related to the independent variables.Unfortunately,R2 is biased for its population counterpart(?2),and the bias increases as the number of variables(p)increases.Researchers have made a lot of efforts to modify R2.The most notable result is the adjusted R2(Radj2)However,Radj2 is still biased,and an unbiased estimator of ?2 is hard to compute or does not exist with an unknown population distribution for the observed variables.Using empirical modeling and statistical learning,this thesis develops new formulas for estimating the population ?2 under normal distribution and non-normal distribution conditions,respectively.The main research contents and innovation points are as follows:1.When the data obey normal distribution,empirical modeling and statistical learning are used to get the corrected R2 by correcting the empirical bias.First,the empirical bias is obtained by Monte Carlo simulation under 3258 different conditions including the sample size(N),p,?2 and the observed values of the R2.Then the most important variable to predict the empirical bias is found out through best-subset regression and the empirical bias formula of R2 is obtained.The corrected expression of R2 is obtained by correcting the bias of R2.Results of cross validation show that empirically corrected estimators contain little bias and perform better than both R2 and Radj2 in mean squared error and variance.2.When the data obey the normal distribution,empirical modeling and statistical learning are used to correct the mean value and variance of R2 at the same time,so as to get the corrected determination coefficient expression.Finally,the cross validation results show that its performance is better than R2 and Radj2.3.When the data does not follow the normal distribution,the influence of kurtosis is considered,and empirical modeling and statistical learning methods are also used to correct the empirical bias.First,the most important variable for predicting empirical bias is obtained,and then the corrected expression of R2 is obtained.The final cross validation results show that the determination coefficients of the correction are better than both R2 and Radj2 in bias,MSE and variance.
Keywords/Search Tags:Coefficient of determination, Empirical modeling, Best-subset regression, Monte Carlo simulation, Empirical bias
PDF Full Text Request
Related items