| Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Without incomprehension of the ground-level details of Hadoop, users can still develop distributed system taking advantage of the high-performance computing and storage capability of clusters. It is widely supported because of its simple programming structure and good capacity of processing massive dataset. While, long response time is one of Hadoop’s drawbacks. Only all the data are finished processing can we get the final result, missing the intermediate results. However, users want to get the similarity intermediate results during the program-processing procedure.Traditional Hadoop can’t satisfy this kind of requirement. Tyson Condie from UC Berkeley implemented Hadoop Online Prototype (HOP) based on MapReduce Online. Making use of the snapshot technique we can get the intermediate results during the running process. Reduce in HOP will begin to calculate data when Map sends partial results to it. In this way, we can get intermediate results. The results sent by Map are all the results produced by current program. Once we want to get intermediate results, Reduce need to recalculate all the data.This paper brings an alternative technique to get intermediate result in Hadoop. It is Online MapReduce Aggregation. Map transfer partial user-defined data to Reduce, after receiving those data, Reduce will produce intermediate results. |