Font Size: a A A

Quantile Regression Method For Large-scale Data With Applications

Posted on:2018-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:C CaiFull Text:PDF
GTID:1310330542961957Subject:Business Administration
Abstract/Summary:PDF Full Text Request
In the age of Big Data,with the development of data generation,collection and storage technology,large-scale data characterized by large samples and high dimension will become mainstream,which not only provides opportunity to explore the laws of real world,but also presents challenges for statistical analysis.In statistical methods,quantile regression(QR)is often implemented to reflect heterogeneity of the explanatory variables to the conditional distribution of response variable and detailed depict the tail behavior of response variable.It is one of the important methods to explore the laws of real world.QR is computed by standard algorithms or statistical packages,however,it is difficult to implement QR on large-scale data owing to limitations of computer primary memory and running time.Therefore,in the background of big data,research on QR for large-scale data and slove the technical problems in modeling process is of great theoretical and practical value to application and dissemination,and reveal the pattern of complexity of economy and social.We selected "quantile regression method for large-scale data with applications" as the topic of this dissertation.Through integrating the discipline of statistics and econometrics and combining theoretical analysis,numerical simulation and application research,we aim to extend the classic quantile regression in small or moderate size data to large-scale data.The main work and innovations are as follows:(1)QR for large-scale data based on sparse exponential transform algorithm(SETQR)is established.The proposed SETQR method has the advantage of simultaneously tackling with the size issue of large-scale data on quantile regression and obtaining accurate result.First,we introduce the modeling step of SETQR method,and demonstrate the error range of parametric estimated value of the proposed method theoretically.Then,the fitting effect,forecasting ability and running time of SETQR method are studied by numerical simulation,and compared with the QR on entire data,SPC2 and SPC3 methods.It is found that the former is similar to the latter in fitting effect and forecasting ability,but the former is better than the latter in running time.Finally,we apply the SETQR method to investigate the relationship between order imbalance and stock returns on Chinese stock market.The empirical results show that one period lagged order imbalance has positive effect with increasing trend on stock returns at higher quantiles while has negative effect at lower quantiles.Furthermore,the lagged order imbalance has negative effect on stock order when the current order imbalance is controlled,and the negative effect shows a downward trend with the increasing of quantiles.This results may have helped investors to understand and master the varying regulation of future returns heralded by order imbalances,and guided investors to make corresponding risk preventive measures and investment strategy for the different stocks..(2)Lasso QR for large-scale data based on randomized samping algorithm(SLQR)is established.The proposed SLQR method has the advantage of simultaneously tackling with the size issue of large-scale data on quantile regression and performing variable selection and accurate prediction.First,we introduce the modeling step of SLQR method,and demonstrate the error range of parametric estimated value of the proposed method theoretically.Then,the numerical simulation results show that the SLQR method is very similar to Lasso QR on entire data in fitting effect and forecasting ability,but the former is better than the latter in running time.The SLQR method performs reasonably well in variable selection.Finally,we apply the SLQR method to investigate the contribution that each of GHG concentrations of tracers to GHG concentrations of synthetic on a real-world data:Greenhouse Gas Observing Network Data Set.The empirical results show that the SLQR method is close to Lasso QR on entire sample in fitting effect,variable selection and forecasting ability;the weights of some tracers is zero,this means the GHG concentrations of tracers have no effect on GHG concentrations of synthetic;most the weights of other tracers gradually reduces as the GHG concentrations of synthetic grows,therefore,we can turn more attention to different tracers based on different levels of GHG concentrations of synthetic.(3)QR for large-scale data based on block estimation method(BAQR)is established.The proposed BAQR method not only can significantly reduce the required amount of primary memory and obtain more accurate and stable result,but also handle stream data and obtain the result in time.First,we introduce the modeling step of BAQR method,and demonstrate the asymptotic properties of the proposed method theoretically that include consistency,convergence rate and the asymptotic normal property.Then,the numerical simulation results show that the BAQR method is very similar to QR on entire data in fitting effect and forecasting ability,and BAQR method is more precise and stable comparing to SETQR,SPC2 and SPC3 methods.Finally,we apply the BAQR method to investigate the decision mechanisms of income on Chinese labor market.The empirical results show that the return to schooling is positive,and the return to schooling gradually reduces as the income grows;the impact of experience to income displays the characteristics of inverted "U-type",and the camber of curve gradually reduces as the income grows.Furthermore,there is sex discrimination and household register discrimination.Therefore,improvingeducational level of people and eliminating discrimination of females and rural residents in employment can help shrink income inequality.(4)Lasso QR for large-scale data based on block estimation method(BLQR)is established.The proposed BLQR method could not only performe variable selection and accurate prediction,but also analyze data that the number of variables is greater than sample size.First,we introduce the modeling step of BLQR method,and demonstrate the asymptotic normal property of the proposed method theoretically.Then,the numerical simulation results show that the BLQR method is very similar to Lasso QR on entire data in fitting effect and forecasting ability,and the BLQR method is better than the SLQR method.The BLQR method performs reasonably well in variable selection.Most importantly,we find that BLQR belongs to an update method that can renew estimation results based on new data.This nature makes the BLQR method more effective when handling stream data.Finally,we apply the BLQR method to investigate the contribution that each of GHG concentrations of tracers to GHG concentrations of synthetic on a real-world data:Greenhouse Gas Observing Network Data Set.The empirical results show that the BLQR method is close to Lasso QR on entire sample in terms of the fitting effect,variable selection and forecasting ability,and superior to SLQR method;the BLQR method is more accurate and stable than the SLQR method in terms of the weights,this will help monitors to make monitoring mechanism and effectively monitor GHG concentrations.
Keywords/Search Tags:Quantile regression, Large-scale data, Randomized samping, Block, Variable selection, Lasso
PDF Full Text Request
Related items