Font Size: a A A

A Comparative Study On Integrated Projection Pursuit Regression Analysis And Comprehensive Traditional Regression Analysis

Posted on:2018-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:W HuFull Text:PDF
GTID:2310330518465280Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Statistical analysis of high-dimensional data is becoming more and more popular in medical research.High-dimensional data challenges traditional multivariate statistical analysis methods on several aspects,such as a large amount of calculation,a dimensional curse,the poor robustness although good in low-dimensional data,etc.The traditional analysis methods,which were mostly established on the basis of normal distribution,can no longer meet the needs of high-dimensional data,especially when the data are not normally distribution.Under such circumstances,the projection pursuit technique began to emerge in the nineteen sixties and seventies.The basic idea of projection pursuit is to project high-dimensional data into lowdimensional(1~3 dimensions)subspace,in order to find a projection that reflects the characteristics or structure of the original high-dimensional data for analysis.The projection index is used to measure the amount of information contained in the projection distribution for the projection pursuit technology.Therefore,the key of projection pursuit is to find the projection direction for the largest or smallest projection index,and the genetic algorithm is usually used to find the optimal projection direction.The projection pursuit regression analysis technique is formed by combining projection pursuit and regression analysis techniques together.This study aims to compare the projection pursuit regression and traditional regression through the fitting and predicting effect over complex data and find out the optimal method.The significance of this study lies in the fact that it makes the applicability of projection pursuit regression more specific and helps to draw medical statistician's attention to projection pursuit,which is beneficial for researchers to choose the best regression method for future research of complex data.The projection pursuit regression technique used in this paper includes the methods which can be realized by R software(Spline method,Gcvspline method,and Supsmu method included in PPR package)and the way based on Hermite polynomial that was compiled by myself.In this paper,the traditional regression analysis mainly referred to the multi-liner regression analysis,the principal component regression analysis,the ridge regression analysis,the partial least squares regression analysis and the robust regression analysis.The “complex data” in this study defined as follows:Firstly,collinearity exists in independent variables.The principal component regression analysis,the ridge regression analysis and the partial least squares regression analysis,which were referred to as the traditional regression analysis method in this paper,were discussed and realized by invoking the REG,PRINCOMP and PLS procedure of SAS software.Secondly,outliers exist in data.The robust regression was discussed as the traditional regression analysis in this paper and realized by the ROBUST procedure of SAS software.Besides the above two ill-conditioned data,this paper also compared the above two techniques using data with good quality,that is,data with no collinearity or outliers,and with promising fitting and predicting effect by multi-liner regression analysis.In this paper,the coefficient of determination and the mean of absolute relative error were the two main indexes used to evaluate the fitting effect,and the absolute relative error and the mean square of prediction error were used to evaluate the predicting effect.The fitting sample used the actual sample data and the predicting sample used the six statistical values(mean,maximum,minimum,median,onequarter quartile and three-quarter quartile)that were formed by corresponding variables of the fitting sample.The analysis results based on the actual data with good quality indicated that the projection pursuit regression analysis outperformed the multi-liner regression analysis both in the fitting and predicting effect,but the difference between the two methods was insignificant.The projection pursuit regression model showed that the coefficient of determination ranged from0.9703 to 0.9988,the mean of absolute relative error ranged from 0.0039 to 0.0187,and the mean square of prediction error ranged from 12.91 to 16.77.The multi-liner regression model,on the other hand,showed that the coefficient of determination was 0.9639,the mean of absolute relative error was 0.0224,and the mean square of prediction error was 18.80.Moreover,regarding the simulated data,the difference between the above two models was neglectable both in the fitting and predicting effect,the fitting effect of the two models were both above 0.9942.This paper also analyzed three actual data with collinearity in independent variables.The traditional regression models based on the first actual data with collinearity showed that the coefficient of determination ranged from 0.9351 to 0.9386,the mean of absolute relative error ranged from 0.0497 to 0.0582,and the mean square of prediction error for the principal component regression model,the ridge regression model and the partial least squares regression model was 1.18,0.66 and 1.14 respectively.The projection pursuit regression model,on the other hand,showed that the coefficient of determination ranged from 0.9756 to 0.9846,the mean of absolute relative error ranged from 0.0316 to 0.0363,and the mean square of prediction error ranged from 0.69 to 0.86.Regarding the second actual data with cillinearity,the traditional regression models showed that the coefficient of determination ranged from 0.9039 to 0.9820,the mean of absolute relative error ranged from 0.0174 to 0.0383,and the mean square of prediction error was 126.59,208.40 and 215.82 respectively.The projection pursuit regression model,on the other hand,showed that the coefficient of determination ranged from 0.9823 to 0.9927,the mean of absolute relative error ranged from 0.0104 to 0.0175,and the mean square of prediction error ranged from 11.00 to 27.25.Regarding the third actual data with cillinearity,the traditional regression models showed that the coefficient of determination ranged from 0.8023 to 0.8924,the mean of absolute relative error ranged from 0.0450 to 0.0642,and the mean square of prediction error was 0.61,0.36 and 0.23 respectively.The projection pursuit regression model,on the other hand,showed that the coefficient of determination ranged from 0.8851 to 0.9980,the mean of absolute relative error ranged from 0.0046 to 0.0481,and the mean square of prediction error ranged from 0.03 to 0.65.Moreover,this paper analyzed two actual data with outliers.The analysis results based on the first actual data with outliers revealed a poor fitting effect both for the robust regression analysis and the projection pursuit regression analysis,of which the former had the highest coefficient of determination of 0.3641 while the latter had the coefficient of determination ranging from 0.1857 to 0.6650.Concerning the second actual data with outliers,the robust regression model showed that the coefficient of determination was 0.8982,the mean of absolute relative error was 0.1377,and the mean square of prediction error was 3.3919.The projection pursuit regression model,on the other hand,showed that the coefficient of determination ranged from 0.9423 to 0.9563,the mean of absolute relative error ranged from 0.0899 to 0.1138,and the mean square of prediction error ranged from 2.3604 to 3.0308.Based on the above results,we draw the following conclusions:(1)Regarding data with good quality,since the fitting effect is insignificant between the multi-linear regression analysis and the projection pursuit regression analysis,and both the two methods have a coefficient of determination above 0.95,considering the projection pursuit regression is more complex than the multi-linear regression on calculation,the multi-linear regression analysis is recommended.(2)Regarding data with collinearity,the projection pursuit regression analysis is recommended over the traditional regression analysis including the principal component regression analysis,the ridge regression analysis and the partial least squares regression analysis.(3)Regarding data with outliers,the projection pursuit regression analysis is also recommended over the robust regression analysis.(4)The quality of the data itself is very important in scientific research.We should pay more attention to scientific research design,especially in finding out the independent variables as many as possible and guaranteeing a sufficiently large sample size and a good representation of the sample.If important independent variables are neglected in pre-data collection,it's hard to make up through statistical analysis.
Keywords/Search Tags:Projection pursuit, Integrated projection pursuit analysis, Comprehensive traditional regression analysis, the fitting effect, the predicting effect
PDF Full Text Request
Related items