Font Size: a A A

Research And Application Of Variable Method For Ultrahigh Dimensional Data

Posted on:2018-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:J W JiFull Text:PDF
GTID:2310330518997503Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the arrival of large data age, in the meteorological forecasting, pattern recognition, gene research and other fields, it often faces ultrahigh-dimensional data.For ultrahigh-dimensional data, only a small number of covariates and the response variables are interrelated, the model presents sparse characteristics, because of the dimension is too high, the traditional robust statistical analysis method and high-dimension data variable selection method will become no longer applicable.In order to better analyze the ultrahigh-dimensional data, we need to reduce the ultrahigh-dimensional data . In recent years, many scholars have proposed a variety of convenient ultrahigh-dimensional feature screening methods, an effective and reasonable way is to divide it into two steps, first use a fast and efficient feature screening process to reduce the ultrahigh-dimensional data to the sample size under the appropriate size, and can retain all the important variables, then use some mature variable selection methods to reduce the dimension. In this paper, two ultrahigh-dimensional feature screening methods are proposed, a robust ultrahigh-dimensional feature screening method is proposed based on the interval conditional quantile in the presence of heterogeneous, heavy tail and other complex ultrahigh-dimensional data. Then, we propose a method based on inverse probability weighting in the problem of response missing at random.The main work of the master's thesis is as follows:In the first chapter, summarizes the research history and current situation of feature screening under ultrahigh-dimensional data, and reviews and the studies the quantile and missing data systematically.In the second chapter, we propose a robust interval conditional quantile screening in ultrahigh-dimensional heterogeneous data, At present, most of the study of the conditional quantile are based on a single quantile level, the feature screening depends on the previously set quantile, which makes the quantile disturbance may lead to feature screening instability, this paper introduces the global quantile regression, a robust ultrahigh-dimensional feature screening method is proposed based on the interval conditional quantile, so that the screening criteria more accurate,and through theoretical proof, simulation research and examples indicating that the improved method is more stable.In the third chapter,we present a ultrahigh-dimensional feature screening method for the responses missing at random. In the existing research work, the feature screening research mainly focuses on the complete data problem. However, in the field of market research, social investigation and medical research, response missing at random, propose a marginal screening process based on inverse probability weighting method. It is also prove its validity through theoretical proof,simulation research and examples indicating.In the fourth chapter summarizes the two methods of feature screening proposed in the paper, and put forward the direction of further research.
Keywords/Search Tags:Ultrahigh-dimensional data, feature screening, missing data, screening consistency
PDF Full Text Request
Related items