Font Size: a A A

Error Analysis Of Distance Weighted Discrimination Based On Unbalanced Data

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J M QiFull Text:PDF
GTID:2427330611990530Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the development of science and information technology,researchers pay more and more attention to high dimension and low sample size(HDLSS)and unbalanced data problems.Support vector machine(SVM),one of the most popular classifiers,depends on only a portion of training samples called support vectors,which leads to the so-called "data piling" problem in the setting of HDLSS.The "data piling" problem leads to sub-optimal performance of SVM in the setting of HDLSS.The distance weighted discrimination(DWD)aims at solving the "data piling " problem which is inherent in SVM under HDLSS setting.However,it can't deal with the unbalanced data very well.The weighted distance weighted discrimination(WDWD)is proposed to improve the performance of the standard DWD by allowing flexible choice of weights under the setting of unbalanced data.The DWD and WDWD have been widely applied to deal with HDLSS.But to our best knowledge,there is little about its mathematical theory,especially it is lack of quantitative convergence analysis.This paper mainly focuses on the WDWD.Our purpose is to build the quantitative error analysis of the algorithm in the framework of statistical learning theory.Firstly,we establish a weighted comparison theorem,which relates the weighted misclassification error with the weighted generalization error.The theorem plays a key role in error analysis.Then,we introduce a novel projection operator to overcome the difficulty caused by the unbounded objective function.Finally,we estimate the weighted generalization error by means of probability inequality and covering number.The rates are explicitly derived for the kernel based WDWD.Our work improves the mathematical foundation of the WDWD.
Keywords/Search Tags:Weighted distance weighted discrimination, Reproducing kernel Hilbert space, Comparison theorem, Error analysis, Convergence rate
PDF Full Text Request
Related items