Font Size: a A A

Improvement Of Confidence Set Method And Simulation Studies In Binary Classification

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z D MengFull Text:PDF
GTID:2517306491960249Subject:Statistics
Abstract/Summary:PDF Full Text Request
As one of the basic problems of statistical learning research,classification is widely used in various fields.The classical machine learning classifiers would learn based on training samples and calssify a sample to a single class whose real category is unkon-wn.In general,kinds of classifiers couldn't guarantee the accuracy of the classification,and may bear a high risk of error.This article mainly studies on confidence set method proposed by Liu in 2019.The advantage of this method can be understood like this:based on one observed training data set,one constructs confidence sets to predict catalogies of all test samples and guarantees that at least 1-?proportion of these confidence sets do contain the true catalogies.And we are ?*100%confident with respect to the randomness in the training data set that the claim is correct.This paper pays attention to the situation of binary classification,focuses on the difference of two types of confidence set classifiers and compares the confidence set methods with classical machine learning methods under balanced samples and unbalanced samples.In the study of balanced samples with different rates(?1,?2),we find that conservative confidence set method has a better coverage level for the true categories of test samples than exact confidence sets,while exact confidence set method has a better single classification level.The critical value of the conservative confidence set may explode with the increase of the sample imbalance coefficient in unbalanced samples,which will cause the single classification level of the conservative confidence set to drop significantly.The exact confidence set method will not only maintain a relatively stable critical value,but also ensure a stable coverage level and single classification level.This paper carries out simulation experiments on the confidence set method and the single classification method(classical machine learning classifiers)under balanced sample and unbalanced sample.The confidence set method has a more stable and high coverage level than the single-class classifier,whether it is for the overall sample or each category(advantageous sample,disadvantaged sample).It shows that compared to single-class classifiers,the confidence set method is a higher coverage and more stable classification method.Finally,this paper presents a secondary classification method based on the confidence method.This method can obtain a larger single classification level at the expense of a smaller coverage level.
Keywords/Search Tags:confidence set, unbalanced samples, sample imbalance coefficient, disadvantaged sample, secondary classification method
PDF Full Text Request
Related items