Font Size: a A A

Research On Mapreduce Programming Model For Heterogeneous Computing Platforms

Posted on:2017-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2308330485982221Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the research of high performance computing has been greatly developed for the demand of large-scale processing and more complex scientific computing. There have been a series of parallel computing architecture, such as Nvidia’s Compute Unified Device Architecture (CUDA), Intel company’s Many Integrated Core Architecture (MIC), IBM’s CELL Architecture. The accelerators of the corresponding architecture have high peak performance, high energy efficiency, low power consumption. Heterogeneous computing mainly refers to the computation on the platform of which consisit of computing unit of different instruction set and computer architecture. Due to the increasing demand for parallelization, heterogeneous computing systems have been increasingly used in the field of high performance. However, to fully exploit the capabilities of heterogeneous computing systems, it is often necessary to have a good grasp of the details of the architecture of the heterogeneous platform, which is much heavy work for developers.MapReduce is a big data processing framework proposed by Google. MapReduce provide users with a custom Map and Reduce interface, developers only need to write Map and Reduce functions for a particular application and can write parallel distributed applications. Since Google proposed the MapReduce framework, developers and researchers have carried out a lot of research work. As big data processing framework, MapReduce is widely used in data mining, machine learning, biological information science and other fields. The implementation of MapReduce framework on the heterogeneous system can simplify the program development on it and make full use of the computing capability of all the devices in the platform. Meanwhile, it can be applied in large area and has very strong application value.In this paper, we present HyMR - the implementation of a hybrid MapReduce framework. HyMR has been designed to fully utilize the computing power of different computing devices, such as the multi-core CPUs, many-core GPUs and Xeon Phis on a heterogeneous platform. We have implemented it as an extensible framework which handles the general operations of all computing devices and uses an extensible runtime system to handle the device-related operations. To derive an efficient mapping onto heterogeneous architectures, we introduce a two-level approach:At the framework level, we have implemented a hybrid job scheduler to dispatch computing tasks among different devices. At the runtime level, we have designed and implemented a extensible HyMR adapter as the abstraction for the general operations of the low level runtimes. Furthermore in order to make full use of the compute power of both the multi-core and the many-core hardware, we use a collaborative computing scheme as well as hybrid parallelism. Furthermore, in order to improve the performance of HyMR, we have proposed the scheme of the hybrid job scheduler, key/value optimization, data transfer optimization.The performance of HyMR is tested using both small-scale and large-scale datasets for four commonly used applications. Compared to Phoenix++ which is the state-of-art MapReduce implementation on multi-core CPUs, HyMR achieves speedups up to 18.7 on a heterogeneous platform for processing large-scale datasets.
Keywords/Search Tags:Heterogeneous platform, MapReduce, POSIX threads, GPU, Xeon Phi
PDF Full Text Request
Related items