Optimization And Application Of SVM Algorithm Based On Spark

Posted on:2018-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:J C Sai

Full Text:PDF

GTID:2348330518996430

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of large-scale data processing technology, people want to get useful information from the massive data increasingly strong demand, some of the small sample sets on the outstanding performance of the machine learning algorithm, is gradually introduced into the large data processing scene. Therefore, how to efficiently parallelize the machine learning algorithms on large-scale data has become the focus of attention of researchers in recent years.Support Vector Machine (SVM) is a machine learning method based on statistical learning VC theory and structural risk minimization theory. It has many advantages over other machine learning algorithms for small sample set, nonlinear data and high dimensional pattern recognition. However, when the support vector machine is applied to large data sets, it is difficult to make good use of the algorithm because of its high computational complexity and long running time. Therefore, in this paper, the parallel optimization of support vector machine(SVM) algorithm is studied for large-scale data environment. In this thesis, we use Spark, which is a very popular parallel computing framework, as the implementation tool of parallel support vector machine.Based on the Spark platform, this paper uses the indexedRDD developed by the University of California at Berkeley to realize the parallelization of the P-pack SVM. In view of the limitation of the model, the BPPGD algorithm is put forward in this paper. Experimental results show that the BPPGD algorithm proposed in this paper has higher classification accuracy and faster execution speed than the P-pack SVM algorithm in large-scale data.Cascade SVM proposed is a multi-level model training method for distributed system design. The last stage of the algorithm can only be run on a single machine, which limits the overall efficiency of model training, resulting in a longer algorithm run time. In this paper, the Cascade SVM algorithm is implemented on the Spark platform, and the CSP-SVM algorithm is proposed for its shortcomings and advantages of the P-pack SVM algorithm. Kernel SVM can make full use of the advantages of parallel distributed system, improve its training speed,and effectively ensure the correctness of classification.Finally, based on the large data analysis platform BDAP developed by the communication software engineering center of Beijing University of Posts and Telecommunications, the integration process of the parallel Kernel SVM on the platform is described. And uses the text data to carry on the performance test to the above two improved algorithm.

Keywords/Search Tags:

support vector machine, Spark, parallel computation, gradient descent

PDF Full Text Request

Related items

1	Spark-based SVM Algorithm Optimization And Application In Text Classification
2	Research On Support Vector Machine Based On Improved Loss Function
3	Research On Parallel Computing Of Support Vector Machines Based On Improved Stochastic Gradient Descent And Its Application
4	Methodologies And Applications For Solving Large-scale Support Vector Machines
5	Study Of Support Vector Machine Algorithms On Unbalanced Dataset
6	Svm Algorithm Optimization And Application In Text Classification Based On Hadoop
7	Imbalanced Stochastic Gradient Descent Online Algorithm For SVM
8	Research Of Cascading Support Vector Machines Based On Spark
9	A Study On Large Scale Nonlinear Support Vector Machines
10	Research On Methods For GPU Based Parallel Acceleration Of Matrix Computation