Font Size: a A A

The Balanced RLT Cross-validation Method

Posted on:2017-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2310330512951002Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In statistical machine Learning,Repeated Learning Test(RLT)cross-validation is one of the most commonly used method.Repeated Learning Test(RLT)cross-validation method employs multiple splits to construct several experiments for a given dataset.Training model in the training set and testing prediction error in the testing set,we average test error to estimate the generalization error of the model.However,the variance of cross-validation estimator are closely related to the employed splits.Simulation and real dates experiment results present the covariance of two held-out rely on the number of common samples in two training sets.The more the number of common samples is,the larger variance of cross validation estimator has.Traditional RLT cross-validation method have multiple random splits for a dataset,and the number of overlapped samples between two splits follows hypergeometric distribution.The random variable increases the variance of RLT estimator.Therefore,this paper presents an Balanced Repeated Learning Test cross-validation method,short for BRLT.It guarantees the number of each sample in training set are equal,and the number of overlapped samples between multiple training sets are equal.First of all,this paper theoretically proves that if the covariance function of any two held-out estimator is the number of overlapped samples in training set and it is convex and monotone increasing,the variance of BRLT generalization error estimator is smallest.Secondly,based on two levels orthogonal table,we introduce BRLT cross-validation structure method and the corresponding algorithm for some special cases.In this paper,a large number of simulation experiments and real data experiment verify that the covariance of any two held-out estimator is related to the number of overlapped samples in training set and it is convex and monotone increasing.For some regression and classification models,we compare the variance of BRLT estimator with the variance of RLT estimates of variance.we can see BRLT estimator significantly reduce the generalization error estimatior variance.In particular,for half-half splits circumstances,an balanced cross-validation method is commonly used in the literature method.In this paper,comparing the variance of balanced m×2 cross-validation estimator with BRLT estimator,the experiment results show that the variance of the BRLT estimator is smaller.
Keywords/Search Tags:Cross-validation, Balanced repeated learning test, generalization error, variance
PDF Full Text Request
Related items