Font Size: a A A

The Design And Implementation Of MapReduce Based On Platform LSF

Posted on:2015-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:C YanFull Text:PDF
GTID:2308330464468753Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Map Reduce, a powerful programming model, is gradually becoming a popular framework. It is very efficient for large-scale data-intensive computing. The Map Reduce programming model represents a potential new methodology in HPC, and many people are interested in exploring the applicability of the techniques. Map Reduce workloads may only represent a small fraction of the overall workload, but they typically require their own independent environment, so it is difficult to support it within traditional HPC clusters. HPC clusters typically already have parallel file systems(such as IBM GPFS or Lustre) and are ’good enough’ for users to investigate the applicability of Map Reduce.IBM Platform LSF(load sharing facility)provides advanced resource management and advanced resource scheduling for HPC environment. LSF software is a leading enterprise-class software that distributes work across existing heterogeneous IT resources creating a shared, scalable, and fault-tolerant infrastructure, delivering faster, more reliable workload performance while reducing cost. LSF maxmize the performance of HPC clusters. HPC efficiently, reliably and quickly runs advanced applications using parallel processing. LSF users want to submit and run the Map Reduce application in LSF environment, so IBM company decideds to start the project, and I was fortunately involved in this project. In order to achieve Map Reduce applications running in the LSF HPC environments, this paper use two methods.The first method, LSF implements Map Reduce applications based on Hadoop, allows users to submit Hadoop Map Reduce workloads as a regular LSF parallel job and run in an HPC cluster environment. Users submit lsf Hadoop.sh together with Hadoop workloads as an LSF job for requesting resources by providing a script for users. Once the LSF job starts to run, the script will automatically provide an open source Hadoop cluster within LSF allocated resources. Users can configure Hadoop cluster in the HPC environments based on the available resources by requesting resources without root privileges. Since each LSF Hadoop job has its own resource/cluster, it provides a multi-tenancy environment to allow multiple users sharing the common pool of HPCcluster resources. The implementation uses the IBM Platform LSF blaunch technology to launch and monitor the Open source Hadoop cluster within LSF job allocation, so that LSF is able to collect resource usage of Map Reduce workloads as normal LSF parallel jobs and have full control of the job life cycle. This paper tests the design based on using HDFS or GPFS, and compares its performance. Hadoop configures and bootstraps execution of Hadoop daemons prior to the execution of Hadoop jobs, and performs cleanup after job execution, all of the above add a lot of overhead. At the same time, the fault Hadoop cluster directly cause failure in Map Reduce operation. In order to solve the shortcomings of the first method, the paper also proposes the second approach.The second approach is Map Reduce application adapted for HPC environments. Map Reduce is tightly coupled with HDFS in Hadoop, so we need to abstract them away from its Map Reduce framework. By using the inherent distributed file system’s functions, it identifies the necessary components for this model. Firstly, input management and distribution, output collection; Secondly, convert Map Reduce application in order to make it suitable for the scheduling of HPC environments; Finally, Parallel and synchronous control, and fault tolerance mechanism. The Map Reduce model not only reflects the applicability but also have high performance. This design uses the parallel file system IBM GPFS, the advanced LSF resources management, senior resource scheduling and robust fault tolerance, and it is beneficial for the efficiency of the Map Reduce. Therefore, LSF integration with Map Reduce appears very meaningful.
Keywords/Search Tags:Map Reduce, LSF, HPC, GPFS, HDFS
PDF Full Text Request
Related items