The Design And Implementation Of Offline Data Processing Engine Based On DAG Model

Posted on:2017-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:R Yin

Full Text:PDF

GTID:2308330509957572

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of economy and science,the amount of data generated by various industries every day is myriad. There are even innumerable data without any rules. Faced with such a complex and countless data, how should we use them? How do we dig out the meaningful information from the data in a shorter period of time? The main purpose of this project is to achieve a common, flexible and efficient mass offline data processing engine.Based on a new design of the current large data processing engine does not have the versatility, this engine is proposed. A new design: the use of DAG(directed acyclic graph) model to establish scenarios. DAG model can satisfy the needs of users according to their own needs to be flexible to change the order of the implementation of each scene. DAG model solves the problem that the engine allows the users to customize the operators. DAG model is conducive to the realization of the engineâ€™s high scalability, flexibility and versatility. In order to improve the processing speed of the engine, this engine uses the Spark computing framework. The intermediate processing results of Spark are stored in memory. In the process of iterative data processing, it can reduce a lot of IO consumption. Meanwhile, Spark interior design model determines its high scalability, which can meet the demand of the engine for scalability, flexibility. Finally, Spark is a distributed computing framework to support DAG, which is compatible with the D AG model selected by the engine. Each operator in this engine represents a data processing function. This engine provides a number of operators, and supports the user to customize the operators according to their own processing requirements. This engine is a further encapsulation of Spark.Users do not need to use the underlying Spark API when they customize the operators. The engine can achieve the docking of various heterogeneous data, can pull data which be specified from the usersâ€™ different data sources to HDFS, and can handle different types of files.The engine has been put into use, currently running well. The engine solves the technical problems of low efficiency and poor universality of the existing large data processing system.

Keywords/Search Tags:

Massive Data, offline data processing, Spark, DAG

PDF Full Text Request

Related items

1	Design And Implementation Of Tobacco Big Data Analysis System Based On Spark
2	Design And Implementation Of The Massive Data Computing Platform Based On Spark
3	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark
4	Platform Design And Massive Data Processing And Implementation Based On Mobile Business Services
5	The Research Of Big Data Manipulating Technology Based On Spark
6	Design And Implementation Of Data Processing And Analysis System Based On Spark
7	Design And Implementation Of A Financial Big Data Processing System Based On Spark
8	A Frequent Serial Episode Mining Algorithm With Time Constraints Based On Spark Platform
9	Design And Implementation Of Big Data Processing Visualization Tool Based On Spark
10	Design And Implementation Of E-Commerce Big Data Process Platform