Spark-based Massive Data Analysis And Performance Optimization

Posted on:2019-03-12

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Qiu

Full Text:PDF

GTID:2348330542998390

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Since the twenty-first century,along with the progress of network transmission technology and the growth of link bandwidth,the Internet users and applications grow rapidly,the amount of data increases as exponential explosive,which is the most obvious change.Massive network traffic data brings storage and computing problems,with the characteristic of high-reliability,efficiency high-scalability,high fault tolerance and low-cost,Hadoop platform becomes a massive network traffic data analysis platform.However,as the volume of data grows rapidly,Hadoop has become increasingly powerless.at this monment,Spark came into being.Compared to MapReduce,Spark is more concise,more efficient.Facing increasingly network traffic data,network performance analysis of massive data analysis is particularly important.In this thesis,the Hadoop data analysis platform is introduced,and the calculation model MapReduce and distributed file system HDFS are briefly described,and focuses more on the Spark calculation framework,including Spark overall architecture,core concepts,job execution processes and Shuffle.Then,based on massive data analysis applications,proposed to the appropriate operator,improve the data local,persistence and select the appropriate degree of parallelism and other performance optimization methods to optimize the operation,and experimental evaluation of comparative performance.Next,based on common operation in Spark-join,e.g PageRank algorithm,achieved the optimization and performance evaluation of the join,which is very instructive for applications which requires the join operation,especially for the recursive scene of multiple joins.

Keywords/Search Tags:

traffic analysis, application optimization, Hadoop, Spark, Join

PDF Full Text Request

Related items

1	Optimizing Big Data Equi-join In Spark And Its Application In Analysis Of Network Traffic Data
2	Research On Query Analysis And Optimization Based On Spark System
3	Implementation And Optimization For Join Operation In Spark
4	Optimization Scheme And Implementation Of Join Operation In Spark Computing Engine
5	Network Traffic Analysis And Optimization Based On Script Language
6	Hadoop Based Efficient Join Algorithm Research On GPU
7	Research On Cardinalities Estimation Of Two Table For Join Operator Based On Spark SQL Platform
8	Research On Equi-Join Optimization Algorithms On Spark SQL
9	Research On String Similarity Join Method Based On Hadoop Platform
10	Real-time Performance Monitoring And I/O Performance Optimization Research On Hadoop Cluster