The Balanced RLT Cross-validation Method

Posted on:2017-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2310330512951002

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

In statistical machine Learning,Repeated Learning Test(RLT)cross-validation is one of the most commonly used method.Repeated Learning Test(RLT)cross-validation method employs multiple splits to construct several experiments for a given dataset.Training model in the training set and testing prediction error in the testing set,we average test error to estimate the generalization error of the model.However,the variance of cross-validation estimator are closely related to the employed splits.Simulation and real dates experiment results present the covariance of two held-out rely on the number of common samples in two training sets.The more the number of common samples is,the larger variance of cross validation estimator has.Traditional RLT cross-validation method have multiple random splits for a dataset,and the number of overlapped samples between two splits follows hypergeometric distribution.The random variable increases the variance of RLT estimator.Therefore,this paper presents an Balanced Repeated Learning Test cross-validation method,short for BRLT.It guarantees the number of each sample in training set are equal,and the number of overlapped samples between multiple training sets are equal.First of all,this paper theoretically proves that if the covariance function of any two held-out estimator is the number of overlapped samples in training set and it is convex and monotone increasing,the variance of BRLT generalization error estimator is smallest.Secondly,based on two levels orthogonal table,we introduce BRLT cross-validation structure method and the corresponding algorithm for some special cases.In this paper,a large number of simulation experiments and real data experiment verify that the covariance of any two held-out estimator is related to the number of overlapped samples in training set and it is convex and monotone increasing.For some regression and classification models,we compare the variance of BRLT estimator with the variance of RLT estimates of variance.we can see BRLT estimator significantly reduce the generalization error estimatior variance.In particular,for half-half splits circumstances,an balanced cross-validation method is commonly used in the literature method.In this paper,comparing the variance of balanced m�2 cross-validation estimator with BRLT estimator,the experiment results show that the variance of the BRLT estimator is smaller.

Keywords/Search Tags:

Cross-validation, Balanced repeated learning test, generalization error, variance

PDF Full Text Request

Related items

1	Study For Model Selection Method Based On Variance-Regularized Cross-Validation
2	The Study For The Comparison Of Classification Algorithms Based On Balanced 5×2 Cross-Validation
3	Heteroscedasticity Test And Variance Estimation For Single Index Model
4	Variance Estimation Based On 3�2 Cross Validation In Ultrahigh Dimensional Linear Regression
5	The Balanced M×2Cross-validation Method
6	Factors that Influence Cross-validation of Hierarchical Linear Models
7	Discrepancy-based model selection criteria using cross validation
8	Multivariate Repeated Measures Models And Comparison Of Estimators
9	Some Statistical Inference And Application Of High Dimensional Regression Models
10	Research Of Preliminary Test Estimators For Error Variance In Linear Model