Font Size: a A A

Comparision Of Several Cross-validated Method Based On The Biological Information Data

Posted on:2014-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HuFull Text:PDF
GTID:2250330401462302Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the research of bioinformatics, one often encounters high dimensional data but small sample sizes, such as DNA microarray expression data, often contain thousands of genes and samples number only dozens.The analysis of such data is today’s hot and difficult. In particular, three objectives:feature selection (genetic selection), select a subset of features from all the characteristics (level of gene expression), in order to construct a good classifier; choice of model (classifier), estimated the different classifications performance choose the best classifier; model evaluation, the classification has been selected, it is estimated that it in the new data on the prediction error, but in fact, the prediction error estimate in the selection of the model (classifier) and assessment have played a key role, because the prediction error is an important indicator of the assessment and classification performance. Typically, when the amount of data is large enough, you can set aside a portion of the data as the test set, as predicted by test error on this test set error estimation, but this does not apply. Currently, commonly used in various forms of cross-validation of the prediction error estimate.Based on the mean square error criterion, use biological information data to compare the prediction error estimates of tow fold,five fold,ten fold,and random5x2cross-validation and balanced3×2cross-validation. By experimental demostrated that the balanced3X2cross-validation is better than two fold, five fold, ten fold, and random5x2cross-validation methods.In statistics, the experiment was repeated more the number should get more accurate results for which we also considered the balanced m x2cross-validation repeated more than3times. However, experimental results show that with the increase of the value of m, the prediction error estimation performance is not significantly improved. To this end, based on the estimated variance, deviation, mean square error calculation complexity and other factors taken into account, we have come to a balanced3x2cross-validation may be an advantage in the bioinformatics data.
Keywords/Search Tags:Cross-validation, Balanced3×2cross-validation, Predictionerror, MSE
PDF Full Text Request
Related items