Font Size: a A A

Preserving the decision boundary through data selection for support vector machines

Posted on:2008-07-28Degree:Ph.DType:Dissertation
University:University of HoustonCandidate:Sun, ChaofanFull Text:PDF
GTID:1448390005964661Subject:Computer Science
Abstract/Summary:PDF Full Text Request
As a state-of-the-art learning approach, support vector machines (SVMs) have been demonstrated to be advantageous over other learning approaches. Due to high time complexity, conventional SVM training becomes intolerably slow and sometimes impractical for big data sets. However, support vectors (SVs) are only a small subset of the data set in many cases, and only this small subset determines the decision boundaries. The goal of this study is to develop data pre-processing procedures, which can efficiently reduce big data sets and preserve the decision boundary without degrading SVM performance.; This study consists of three levels of data selection for SVMs. In the first level, data selection is carried out using the closest pairs (CPs) and the nearest neighbors of the opposite class (NNOs) approaches. These approaches select only boundary region vectors (BRVs), which preserve the decision boundary, implying the SVM performance comparable to that of the full data set. Investigations show that BRV based data selection works well for small data sets. In the second level, spatial approximation sample hierarchy (SASH) trees are used to speed up BRV-based data selection for big data sets. Investigations show that by using SASHs we can approximate the exact BRVs with 90% or higher accuracy. The overall time saved in this level can be 60% or more if data sets are larger than 30k vectors. In the third level, limited-size SASHs are used to further reduce the time used in data selection for over-sized data sets. Analysis and experiments demonstrate that data selection can be done in linear time. Throughout this study, we have demonstrated that the proposed data selection approaches can efficiently select BRVs by which the decision boundary can be well-preserved. The same idea is also applied in active learning.
Keywords/Search Tags:Decision boundary, Data, Support, SVM, Approaches
PDF Full Text Request
Related items