Font Size: a A A

Comparative Study Of Nonparametric And Parametric Regression Models For Data With One Independent Variable And Intelligent Realization Of The Optimal Model

Posted on:2013-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y J JiaFull Text:PDF
GTID:2210330374461001Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
【Objective】 In practical scientific research, researchers often encounter theproblem that the distribution of data does not meet the hypothesis premise ofparametric model or the distribution of the population which the data come from isunknown. In that case, the regression equation fitted by a parametric model is usuallyunsatisfactory. This paper compares the fitting effects of nonparametric andparametric regression models for data with one independent variable, seeks andrecommends the optimal curve fitting method for curves with one independentvariable that can be linearized, discusses the application of nonparametric andparametric regression both when the data satisfy and dissatisfy the strict parametricassumptions, extends the application of nonparametric regression in order to changethe stereotype of it, and adopts the SAS software to achieve intelligent realization ofthe optimal model.【Content】This study involves four common monotonic variation curves, onenon-monotone curve and two types of models: parametric regression andnonparametric regression. The parametric regression often uses curve linearization tofit one data set with several models, compares the fitting effects of the several modelsand chooses the best one as the final model. The nonparametric regression achieves agood fitting effect by selecting the best window width according to the selectioncriteria of the window width.The study involves five common curves that can be linearized, namely, thelogarithmic curve, the hyperbolic curve, the power curve, the exponential functioncurve and the logistic curve, of which the former four types belong to the monotonecurve, while the logistic function curve belongs to the non-monotone curve. Theabove five types of curves can fit regression models by curve linearization methods, of which the main idea is to transform the corresponding functions so that theobtained two variables are linear, perform the regression analysis to estimate themodel parameters in order to obtain the model equation and then change the variablesinto the original ones to get the final regression equation. The logistic curve can alsobe obtained by introducing multiple virtual independent variables and using thebinomial regression, the trinomial regression to achieve the curve fitting, which stilladopts the curve linearization method in essence. It can also be obtained by nonlinearregression based on the rough model parameters. All the above methods belong to theparametric regression. The parametric regression model can achieve a satisfyingfitting effect only when the distribution of the data is clear or when the parametricregression model is aimed at some special data. Once the distribution of the data isunclear, or to some particular data, even when the distribution of the data is clear, theparametric regression fails to get a satisfying fitting model if the parametric regressionis not suitable for the data.This paper selects four nonparametric regression models which are mostcommonly used, namely, the kernel regression, the spline regression, the localpolynomial regression and the additive model. The SAS software does not have aready procedure to achieve the kernel regression estimation. Therefore, the IMLmodule is invoked for programming based on the calculation principle so that the SASsoftware is able to realize the kernel regression analysis and output the predictivevalue.This paper analyzes data generated from each curve based on the correspondingformula by programming through the SAS software, respectively selects fourparametric regression models and four nonparametric regression models for eachcurve and write them into one program in order to achieve automatic judgment,automatic comparative and automatic output.【Methods】 Monte Carlo techniques are used in this paper for sampling in thefollowing four intervals:(0,10],[10,100],[100,1000], and (0,10000]. Select10,100and1000samples from the former three intervals respectively and each sample selects 10points since the interval range is not wide. Thus10,100and1000data sets aregenerated and each data set has10sample points. As to the last interval, select100sample points for each sample since its interval range is considerably large. Put thevalue of X in each data set into the given function to obtain the corresponding valueof Y. Apply the four nonparametric methods and the four parametric methods to eachdata generated from each interval to fit the curve regression equation and then do thefollowing work:1. Compare the fitting results of the four non-parametric regressionmodels;2. Compare the fitting results of the four parametric regression models;3.Choose the best fitting model of the four nonparametric regression models and thebest fitting model of the four parametric models and perform hypothesis testing to thetwo models to examine whether the difference of the fitting effects is statisticallysignificant.4. Rank the eight methods based on the fitting effect and select the bestfitting model.In terms of the evaluation of the fitting effect, since there is only one independentvariable and one dependent variable in the models, the coefficient of determinationR2and the mean square error MSE are used as the evaluation criteria. Othercomparative criteria are the same in essence as long as only one independent variableis involved. The smaller the residual sum of squares RSS is, the better.As to the programming and application of the software, since the statisticalsoftware SAS has lots of procedures, this paper chooses3nonparametric regressionprocedures that are mostly commonly used: LOESS, TPSPLINE, GAM. The SASsoftware does not have a ready procedure to achieve the kernel regression estimation.Therefore, the IML module is invoked for programming based on the calculationprinciple. The four curve regression models that are achieved by curve linearization ofthe parametric regression are fitted and analyzed through the REG procedure. Theeight regression methods are written into one program to output the residual sum ofsquares, the degree of freedom of error,R2, MSE. As to the model testing, output thevalue of P by manual programming to achieve automatic judgment, automatic comparative and automatic output.【Results】To each sample of each interval of each curve, the fitting effect of thenonparametric regression is better than that of the parametric regression, especiallyfor the monotone curve due to the fact that the hypothesis testing shows that thedifference of the fitting effects of the nonparametric regression and the parametricregression is statistically significant every time. As to the monotone curve, theparametric regression models are less stable than the nonparametric regression models,which is shown by the fact that to some special data of which the distribution is clear,the parametric regression model may have a satisfying fitting effect only when it isaimed at the data; however, other parametric regression models can not achievesatisfying fitting effects, which once again proves that the parametric regressionmodels are valuable only when the data meet the strict premises of the assumption. Asto the nonparametric regression models, no matter how the distribution of the datachanges, the fitting effects are always satisfying. In terms of the non-monotoneLogistic curve, the fitting effects of the nonlinear regression and the trinomialregression are better than the fitting effect of the linearized Logistic curve. When thedifference of the values of y corresponding to each point is very small, the fittingeffects do not differ between the parametric regression and the nonparametricregression, but the coefficient of determination R~2of the non parametric regression isgreater than that of the parametric regression, and MSE of the nonparametricregression is smaller than that of the parametric regression. Besides, except that thesimulated data perfectly meet the strict assumptions of the parametric regressionmodels, the fitting effects of the nonparametric regression models are better than thoseof the parametric models and the difference is statistically significant. Besides, inpractical application, the nonparametric regression has more advantages than theparametric regression in terms of data expression, exploration and fitting..【Conclusions】The parametric regression has strict premises for data, while thenonparametric regression almost has no requirement for data. It has no requirementfor the distribution of the population because it obtains the needed information from the sample or the data itself, fully uses the information of the data to build the modeland enables the estimation value of each point approach to the measured actual valueas much as possible. Therefore, it has high efficiency, good fitting effect and robustresult. Among the four nonparametric regression models, the local polynomialregression model is more effective than the other three.The nonparametric regression models can show the real situation of data changebetter than the parameter regression models; therefore, they can better discover andreveal the potential factors that influence the data change. Thus, the nonparametricregression models are better than the parametric regression models as to theexpression and fitting of data.
Keywords/Search Tags:nonoparametric repression, parametric regression, local polynomialfitting comparative of the fitting effec
PDF Full Text Request
Related items