Font Size: a A A

A Comparative Study Of Classical Statistics And Bayesian Statistics In Regression Model

Posted on:2019-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:H M GuFull Text:PDF
GTID:2370330542997336Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
[Objectives] There are two major schools of statistics: Classical statistics and Bayesian statistics.The most widely studied in statistics is regression analysis.The regression analysis model can be used to establish models based on samples to explore and screen factors related to the results,and to predict new samples.There are many kinds of regression analysis,such as multiple linear regression,Possion regression,Logistic regression,Cox proportional and non-proportional risk regression models often used for survival data in medicine,etc.,and a suitable regression model can be selected according to the type of specific data.Classical statistics encounter many difficulties in regression analysis,such as the discovery and elimination of outliers,the diagnosis and screening of collinearity of independent variables,and the elimination of collinearity.Although classic statistical methods have corresponding solutions,they may sometimes reduce the accuracy of the model.The purpose of this paper is to start from different types of data,using Classical statistics and Bayesian statistics to establish a regression analysis model,compare the advantages and disadvantages of the two methods to establish the model,to facilitate researchers based on the type of data and the prerequisites Select the appropriate statistical analysis method.[Contents] This article introduces and summarizes the principles of Classical statistics and Bayesian statistics regression analysis by reviewing,arranging,and analyzing literature on Classical statistics and Bayesian statistical methods at home and abroad.The statistical data to be analyzed includes good quality data,co-linear data,heteroscedastic data,missing data,and binary data.The corresponding regression models were established using Classical statistics and Bayesian statistics.[Methods] When the result is a quantitative variable,when the sample size is small,only the fitting effect is evaluated,and the absolute value of absolute error(Abserror),sum of squared residuals(SSress)and determination coefficient(R2)are used as evaluation indexes.When the sample size is large,first use all the samples to establish the model evaluation and fitting effect.The evaluation index is the same as the evaluation index when the sample is small;then use some samples to build the model and evaluate the fitting effect of the training set using Abserror,residual mean square(MSE)and R2 were used as evaluation indicators;Abserror,MSE,and Standardized Mean Square Error(NMSE)were used as evaluation indicators to evaluate the test set's test effectiveness.This article uses a 10-fold cross validation method to split large sample data.The result is the data of the binary variable: After the model is established,the sensitivity,specificity,accuracy,and area under the ROC curve are used as evaluation indicators.[Results and Conclusions] After the study found that the data itself is of good quality,when there is no collinearity and outliers,Classical statistics has the best fitting effect;Bayesian method with prior information is the best.However,as a whole,the results of evaluation indicators are not much different,and both methods are acceptable.If there is not enough prior information,it can be replaced with no information.When there is collinearity among the independent variables,the classical statistical modeling method chooses principal component regression and ridge regression method,and compares with Bayesian method.If the “Abserror and SSress minimum,R2 value is the largest” is used to evaluate the fitting effect,the Bayesian method fitting model without information prior is optimal,the prior distribution is specified second,the principal component regression is again,and the ridge regression The final analysis.It can be seen that if there is collinearity,the Bayesian method has a smaller effect.When there is no suitable prior distribution,no information can be used instead.When there is heteroscedasticity in the data,a quantile regression analysis is used to establish a model and the model is modeled by taking the lower quartile(q1),median(q2),and upper quartile(q3)of the dependent variable.Since Bayesian statistics specify the prior information,the parameter locus map does not converge,so no information is used prior.The Bayesian statistical method has better quantile indicators,and the fitting and forecasting results are better than the classical methods.When there are missing values in the data,however the missing values are removed or filled,the model based on the Classical statistical methods is better than the Bayesian statistical method in terms of the fitting effect and the prediction effect.In Logistic with dependent variables being binary variables,when the cutoff value is 0.5,there is no difference in the accuracy of prediction.Calculating the corresponding sensitivity and specificity using each cut-off value,we found that the results obtained by the two statistical methods are not much different;the area under the calculated ROC curve has no information prior to the maximum for the data analyzed in this paper.It is 0.93474,but it is still smaller than the area of 0.9386 automatically obtained by the classic statistical Logistic process.In addition,the ROC curve established by the Bayesian method used in logistic regression in this paper intercepts some boundary values,and the area under the ROC curve calculated manually is relatively large,but the operation is relatively tedious.In actual use,the corresponding statistical methods need to be used according to specific needs.
Keywords/Search Tags:Classic statistics, Bayesian statistics, Regression analysis, Fitting effect, Prediction effect
PDF Full Text Request
Related items