Font Size: a A A

Feature Screening Method SEVIS And Its Application

Posted on:2018-11-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M LiaFull Text:PDF
GTID:1310330512982694Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Due to the improvement of the level of technology and the capability of data min-ing.Ultrahigh-dimensional data(the number of predictor p>>the number of sample n)become more and more frequently in many diverse fields such as finance and ge-netics.Under this era of Big Data,extracting the truly important exploratory variables available in ultrahigh-dimensional data plays a key role in scientific researches among those fields.Under this situation,tradition penalty function methods always suffer poor performances in terms of efficiency,accuracy and stability(see Fan et al.[13]).Differ-ence from the penalty function methods,the core idea of feature screening is excluding those features that significantly unrelated to response variably to reduce the dimensional of exploratory variable.In Chapter 2,we propose a new feature screening approach:sure explained vari-ability and independence screening(SEVIS).Difference with existing researches that mainly based on the central tendency,SEVIS focuses on variability characteristic which is as much important as the central tendency in statistical inference.Therefore,SE-VIS has some kinds of advantage in handling with the asymmetric and nonlinear data compare to tradition feature screening methods.Also in this chapter,we propose a nonparametric kernel estimation approach and proof that SEVIS possesses desired sure screening property and ranking consistency property as other feature screening methods under those estimates.In addition,SEVIS is one of model-free methods,that means SE-VIS doesn't need to assume dependence structures between explanatory variables and response variables,so it can avoid the error from misspecified models which may suffer by model-based methods.Meanwhile,we also show that compared with several repre-sentative methods,SEVIS has superior performance with the data contains interactions,heteroscedastic terms or censored observations.A real example about ovarian cancer genetics also show that mostly genes chosen by SEVIS have stronger explanatory power for response variable and the genes chosen by other methods.Considering the kernel estimation of SEVIS still has space for improvement.In Chapter 3,we change the kernel estimation to local estimation which been known as more accurate and effective.Some simulations about the feature screening in special situations also proof our view and show that our new algorithm is more accurate and efficient than the kernel-based one.Due to the increasing number and kinds of assets could be invested in the mar-ket,traditional estimate methods that based on Mean-Variance model will face new challenges under this kind of ultrahigh-dimensional data.In Chapter 4,we apply our SEVIE method on the asset selection processing and propose a new portfolio construc-tion method.Briefly,we use intraday high frequency data to create a new intraday high frequency Sharpe Ratio index and select the assets which have high correlation with this index via SEVIS.It is worth mentioning that existing feature screening methods includ-ing SEVIS,all consider the i.i.d.samples in their procedure,but it's obviously that finance data is a kind of serial correlation data instead i.i.d.samples.So in this Chap-ter,we also proof SEVIS still satisfy sure screening property and ranking consistency property under stationary ?-mixing series.Several simulations are also provided to test the performance of SEVIS under this kind of data.In the real data part,we show that our method can earn excess return during 2014 to 2015 year in Chinese stock market.
Keywords/Search Tags:Ultrahigh-dimensional data, feature screening, SEVIS, nonparametric es-timation, high frequency Sharpe Ratio
PDF Full Text Request
Related items