Font Size: a A A

Research Of Cascading Support Vector Machines Based On Spark

Posted on:2024-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2568307073476644Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the arrival of the big data era,how to deal with big data has become the focus of attention.Support vector machine can well handle data classification and regression problems.However,because of the high space complexity and time complexity of this algorithm,when the scale of the data set is large,the space required for storing data and the training time will increase exponentially.In order to solve these problems,the support vector machine model is parallelized based on Spark distributed computing framework.It mainly includes the following work:First of all,as a distributed model,cascading support vector machine model can effectively reduce the calculation time,but the accuracy of Cascade SVM is lower than that of single machine SVM.So we change the structure of two-level and pairwise combination of Cascade SVM to hybrid combination,and implement improved Cascade SVM based on Spark,thus improving the accuracy of the model to a certain extent.Secondly,in order to further improve the accuracy of Cascade SVM,whale optimization algorithm(WOA)is used to optimize parameters of SVM C and g.Because WOA is easy to fall into the local optimal solution and cannot arrive the global optimal solution,the linear convergence factor of WOA is modified to the nonlinear convergence factor,and Cauchy mutation factor is introduced to increase the possibility of searching the global optimal solution.The improved WOA will optimize the parameters C and g.Validated on 8 benchmark functions,the improved WOA outperforms the base WOA and PSO.Finally,the data in LibSVM website is used as the experimental data set to compare the performance of stand-alone SVM,Cascade SVM and improved Cascade SVM in terms of training time,accuracy and number of support vectors.The experimental results show that when the data size is large,the running time of the improved Cascade SVM is significantly reduced compared with the stand-alone SVM.Although the running time of the improved Cascade SVM is slightly longer than that of the basic Cascade SVM,the accuracy of the improved model is better than the other two models,and the number of global support vectors is also more than the other two models.
Keywords/Search Tags:Support Vector Machine, Cascading Support Vector Machine, Spark Distributed Computing Framework, Whale Optimization Algorithm
PDF Full Text Request
Related items