Research Of Cascading Support Vector Machines Based On Spark

Posted on:2024-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:R Zhao

Full Text:PDF

GTID:2568307073476644

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

With the arrival of the big data era,how to deal with big data has become the focus of attention.Support vector machine can well handle data classification and regression problems.However,because of the high space complexity and time complexity of this algorithm,when the scale of the data set is large,the space required for storing data and the training time will increase exponentially.In order to solve these problems,the support vector machine model is parallelized based on Spark distributed computing framework.It mainly includes the following work:First of all,as a distributed model,cascading support vector machine model can effectively reduce the calculation time,but the accuracy of Cascade SVM is lower than that of single machine SVM.So we change the structure of two-level and pairwise combination of Cascade SVM to hybrid combination,and implement improved Cascade SVM based on Spark,thus improving the accuracy of the model to a certain extent.Secondly,in order to further improve the accuracy of Cascade SVM,whale optimization algorithm(WOA)is used to optimize parameters of SVM C and g.Because WOA is easy to fall into the local optimal solution and cannot arrive the global optimal solution,the linear convergence factor of WOA is modified to the nonlinear convergence factor,and Cauchy mutation factor is introduced to increase the possibility of searching the global optimal solution.The improved WOA will optimize the parameters C and g.Validated on 8 benchmark functions,the improved WOA outperforms the base WOA and PSO.Finally,the data in LibSVM website is used as the experimental data set to compare the performance of stand-alone SVM,Cascade SVM and improved Cascade SVM in terms of training time,accuracy and number of support vectors.The experimental results show that when the data size is large,the running time of the improved Cascade SVM is significantly reduced compared with the stand-alone SVM.Although the running time of the improved Cascade SVM is slightly longer than that of the basic Cascade SVM,the accuracy of the improved model is better than the other two models,and the number of global support vectors is also more than the other two models.

Keywords/Search Tags:

Support Vector Machine, Cascading Support Vector Machine, Spark Distributed Computing Framework, Whale Optimization Algorithm

PDF Full Text Request

Related items

1	Research On Some Problesm Of Support Vector Machine Learing Algorithm
2	Research And Optimization On Semiparametric Support Vector Machine Under Spark Framework
3	Study And Application On Support Vector Machine Classification
4	The Doubly Regularized Support Vector Machine With A Globally Linearly Convergent Algorithm
5	Research On Multi-hyperplane Twin Support Vector Regression Algorithm And Its Optimization
6	Performance Improvement Based On Pinball Support Vector Machine
7	Research On Robust Support Vector Machines
8	Some Algorithms Research On Support Vector Machines
9	Support Vector Machine And Its Applications
10	Support Vector Machine Algorithm And Its Application To Intrusion Detection