Research On Mapreduce Programming Model For Heterogeneous Computing Platforms

Posted on:2017-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2308330485982221

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, the research of high performance computing has been greatly developed for the demand of large-scale processing and more complex scientific computing. There have been a series of parallel computing architecture, such as Nvidia’s Compute Unified Device Architecture (CUDA), Intel company’s Many Integrated Core Architecture (MIC), IBM’s CELL Architecture. The accelerators of the corresponding architecture have high peak performance, high energy efficiency, low power consumption. Heterogeneous computing mainly refers to the computation on the platform of which consisit of computing unit of different instruction set and computer architecture. Due to the increasing demand for parallelization, heterogeneous computing systems have been increasingly used in the field of high performance. However, to fully exploit the capabilities of heterogeneous computing systems, it is often necessary to have a good grasp of the details of the architecture of the heterogeneous platform, which is much heavy work for developers.MapReduce is a big data processing framework proposed by Google. MapReduce provide users with a custom Map and Reduce interface, developers only need to write Map and Reduce functions for a particular application and can write parallel distributed applications. Since Google proposed the MapReduce framework, developers and researchers have carried out a lot of research work. As big data processing framework, MapReduce is widely used in data mining, machine learning, biological information science and other fields. The implementation of MapReduce framework on the heterogeneous system can simplify the program development on it and make full use of the computing capability of all the devices in the platform. Meanwhile, it can be applied in large area and has very strong application value.In this paper, we present HyMR - the implementation of a hybrid MapReduce framework. HyMR has been designed to fully utilize the computing power of different computing devices, such as the multi-core CPUs, many-core GPUs and Xeon Phis on a heterogeneous platform. We have implemented it as an extensible framework which handles the general operations of all computing devices and uses an extensible runtime system to handle the device-related operations. To derive an efficient mapping onto heterogeneous architectures, we introduce a two-level approach:At the framework level, we have implemented a hybrid job scheduler to dispatch computing tasks among different devices. At the runtime level, we have designed and implemented a extensible HyMR adapter as the abstraction for the general operations of the low level runtimes. Furthermore in order to make full use of the compute power of both the multi-core and the many-core hardware, we use a collaborative computing scheme as well as hybrid parallelism. Furthermore, in order to improve the performance of HyMR, we have proposed the scheme of the hybrid job scheduler, key/value optimization, data transfer optimization.The performance of HyMR is tested using both small-scale and large-scale datasets for four commonly used applications. Compared to Phoenix++ which is the state-of-art MapReduce implementation on multi-core CPUs, HyMR achieves speedups up to 18.7 on a heterogeneous platform for processing large-scale datasets.

Keywords/Search Tags:

Heterogeneous platform, MapReduce, POSIX threads, GPU, Xeon Phi

PDF Full Text Request

Related items

1	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
2	Design Of Patch Clamp Amplifier System Based On Qt/E
3	Mapreduce Job Scheduling For Heterogeneous Geo-distributed Clusters
4	Research And Application Of Posix IPC In Billing System Model Of Telecommunication
5	Research On Heterogeneous System Oriented Parallel Programming
6	Research On Low Power Scheduling Technology For Heterogeneous Cluster Based On MapReduce
7	Research For The Implementation And Optimization Technology Of Typical Image Processing Algorithms On Xeon Phi
8	Research On Deadlock Detection In Multi-thread Based On Petri Net
9	POSIX-Compatible Cloud Netdisk System Based On SMDFS
10	A Client Of Distributed Filesystem Based On POSIX Semantics