Research On Large-scale Traffic Classification Technology Based On Spark Performance Optimization

Posted on:2021-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:K Yang

Full Text:PDF

GTID:2428330611450310

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,the increasingly mature internet technology has promoted the continuous development of the information society,which not only brings great convenience to people's work and study,but also generates the explosive growth of network traffic.In the face of huge network traffic,both storage capacity and computing efficiency pose severe challenges to the traditional network traffic classification technology based on single machine environment.How to classify network traffic accurately and quickly has become a hot issue to be solved urgently.As a popular big data analysis platform,Spark has become an effective way to solve this problem by enabling distributed storage,providing in-memory computing,and extremely high operational efficiency.At the same time,random forest is a good performance and easy to parallelize classification algorithm.Therefore,the research content of this paper can be divided into the following two parts.This paper firstly studies the application of random forest classification algorithm in Spark platform.In the process of network traffic classification of random forest,decision trees with different classification abilities can not be treated differently.This paper implements a weighted random forest algorithm based on Spark platform,so as to give full play to the performance advantages of decision trees with strong classification performance and reduce the impact of decision trees with poor classification ability.Experimental results show that the algorithm proposed in this paper has higher classification accuracy and good scalability.Secondly,this paper studies the performance optimization technique of Spark.In order to solve the problem that the Shuffle operation triggered by the Shuffle operation during the execution of the Spark job seriously affects the performance of Spark,this paper uses the Spark Shuffle acceleration plugin,crail-spark-io,to optimize the Spark Shuffle.The plugin is implemented based on RDMA remote directmemory access technology.Since the plugin can not handle aggregate class operators in a multi-partition environment,this article optimizes the Shuffle logic of the accelerated plugin to take full advantage of clustering resources to improve Spark performance.

Keywords/Search Tags:

network traffic classification, Spark, random forest algorithm, RDMA, Crail-Spark-IO

PDF Full Text Request

Related items

1	Network Traffic Classification Based On Spark Frame
2	Research On Network Traffic Classification Technology Based On Spark
3	Research On Random Forest Classification Algorithm Based On Spark Distributed Platform
4	Research On Parallelization And Optimization Of Random Forest Classification Algorithm Based On Spark
5	Research And Implementation Of Network Traffic Anomaly Detection Based On Spark Platform
6	Research On Parallel Text Classification Algorithm Base On Random Forest And Spark
7	The Design And Implementation Of Anomalous Network Traffic Detection System Based On Spark
8	The Research Of Real-time Network Traffic Anomaly Detection Based On Spark Technology
9	Classification Of Encrypted Traffic Application Service Based On Spark Platform
10	Research On Efficient Parallelization Of Improved Random Forest Algorithm Based On Spark Platform