Investigation of topics in U-statistics and their applications in risk estimation and cross-validatio

Posted on:2013-04-23

Degree:Ph.D

Type:Dissertation

University:The Pennsylvania State University

Candidate:Wang, Qing

Full Text:PDF

GTID:1450390008490372

Subject:Statistics

Abstract/Summary:

The primary goal of my dissertation has been to develop new methods, including theory and practical implementation, in the area of U-statistics. This area is quite old, with many important results first appearing in Hoeffding (1948). There have been many applications of U-statistics in nonparametric statistics. One area that is quite modern and active is cross-validation and risk estimation, although it has not traditionally been thought of as a U-statistic area. The application of my research has been focused on this area.;The first objective of my research is to devise the best unbiased variance estimator for a general U-statistic. It can be written as a quadratic form of the kernel function and is applicable as long as the kernel size k ≤ n/2. In addition, it can be represented as a familiar ANOVA form as a contrast of between-class and within-class variation. As a further step to make the proposed variance estimator more practical, we developed a partition resampling scheme that can be used to realize the U-statistic and its variance estimator simultaneously with high computational efficiency.;We then turn our attention to the implementation of U-statistics in risk estimation in the context of the nonparametric kernel density estimator. We propose to construct a U-statistic form estimate for the risk that arises from L2 and Kullback-Leibler distance respectively. In addition, we consider a two-stage, "subsampling+extrapolation", bandwidth selection procedure which can help to reduce the variability of the conventional cross-validation bandwidth selector dramatically. It is equivalent to Hall and Robinson's (2009) [27] rescaled "bagging cross-validation" bandwidth selector if one sets the fictional sample size equal to the bootstrap size. However, the simple form for our U-statistic risk estimator enables us to calculate the aggregated risk much more efficiently than bootstrapping. Moreover, a real data example in the context of model selection is considered. We construct a U-statistic cross-validation tool, akin to the BIC criterion for model selection. The U-estimator for the likelihood risk is more generally applicable than the AIC and BIC methods. In addition, with our proposed variance estimator for a general U-statistic we can test which model has the smallest risk. Finally, we will explore extrapolation and interpolation techniques with applications in bandwidth selection, variance estimation, and quantile estimation. Some preliminary results will be discussed in the end of the dissertation.

Keywords/Search Tags:

Estimation, Risk, U-statistic, Area, Variance, Applications, Selection, Bandwidth

Related items

1	The Positive Selection Estimation And Statistic Test For Two Kinds Of Codon Substitution Models
2	Research On Non-linear M-Estimates And Its Application
3	Heteroscedasticity Test And Variance Estimation For Single Index Model
4	Selection And Application Of Adaptive Bandwidth Matrices Of Spatially Varying Coefficient Model
5	The Study On Risk Criteria Of Semivariance And Its Application
6	Elastic Net-Based Linear Model Variance Inference And Its Applications
7	On nonparametric estimation and inference with censored data, bandwidth selection for local polynomial regression, and subset selection in explanatory regression analyse
8	Research On Expectation-Maximization Algorithm For Posterior Variance Estimation Of Nonlinear Adjustment And Its Applications
9	Study On The Applications Of Variable Selection In Ultrahigh Dimensional Regression
10	Robust Estimation Of The Gps Data Processing