The Research Of Online Aggregation On MapReduce

Posted on:2014-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:K Dai

Full Text:PDF

GTID:2308330464459950

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Without incomprehension of the ground-level details of Hadoop, users can still develop distributed system taking advantage of the high-performance computing and storage capability of clusters. It is widely supported because of its simple programming structure and good capacity of processing massive dataset. While, long response time is one of Hadoopâ€™s drawbacks. Only all the data are finished processing can we get the final result, missing the intermediate results. However, users want to get the similarity intermediate results during the program-processing procedure.Traditional Hadoop canâ€™t satisfy this kind of requirement. Tyson Condie from UC Berkeley implemented Hadoop Online Prototype (HOP) based on MapReduce Online. Making use of the snapshot technique we can get the intermediate results during the running process. Reduce in HOP will begin to calculate data when Map sends partial results to it. In this way, we can get intermediate results. The results sent by Map are all the results produced by current program. Once we want to get intermediate results, Reduce need to recalculate all the data.This paper brings an alternative technique to get intermediate result in Hadoop. It is Online MapReduce Aggregation. Map transfer partial user-defined data to Reduce, after receiving those data, Reduce will produce intermediate results.

Keywords/Search Tags:

MapReduce, HOP, Hadoop, Aggregation

PDF Full Text Request

Related items

1	The Research Of Online Aggregation On MapReduce
2	MapReduce Performance Research And Optimization Based On Block Aggregation
3	Research On Online Aggregation Query Processing Based On Hadoop
4	Research On The Performance And Optimization Of MapReduce Model In Hadoop Platform
5	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
6	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
7	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
8	The Performance Optimization And Improvement Of MapReduce In Hadoop
9	Research On Improving The Fault Tolerance Performance In MapReduce
10	Research On Scheduling Algroithm In Hadoop Mapreduce