Large-scale in-memory data processing

Posted on:2015-05-28

Degree:Ph.D

Type:Dissertation

University:Hong Kong University of Science and Technology (Hong Kong)

Candidate:Ma, Zhiqiang

Full Text:PDF

GTID:1478390017495492

Subject:Computer Science

Abstract/Summary:

As cloud and big data computation grows to be an increasingly important paradigm, providing a general abstraction for datacenter-scale programming has become an imperative research agenda. Researchers have proposed, designed and implemented various computation models and systems on different abstraction levels, such as MapReduce, X10, Dryad, Storm and Spark. However, many abstractions expose the distributed detail of the platform to the application layer, and lead to increased complexity in programming, decreased performance, and, sometimes, loss of generality. At the data substrate layer, traditional cloud computing technologies, such as MapReduce, use disk-based file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAMto store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data processing and storage in the datacenter environment, and the capacity of such systems is limited by the amount of physical memory in the cluster.;To support general, efficient, flexible, and concurrent application workloads with sophisticated data processing, we present programmers an illusion of a big virtual machine built on top of one, multiple or many compute nodes and unify the physical memory and disks of the nodes to form a globally addressable data substrate. We design a new instruction set architecture, i0, to unify myriads of compute nodes to form a big virtual machine called MAZE where thousands of tasks run concurrently in VOLUME, a large, unified, and snapshotted distributed virtual memory. i0, MAZE and VOLUME form the foundation of the Layer Zero systems which provide a general substrate for cloud computing. i0 provides a simple yet general and scalable programming model. VOLUME mitigates the scalability bottleneck of traditional distributed shared memory systems and unifies the physical memory and disks on many compute nodes to form a distributed transactional virtual memory. VOLUME provides a general memory-based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparently to programmers, scales the system to process and store large datasets by swapping data to disks and remote servers. Along with the efficient execution engine of MAZE, the capacity of a MAZE can scale up to support large datasets and large clusters.;We have implemented the Layer Zero systems on several platforms, and designed and implemented various benchmarks, graph processing and machine learning programs and application frameworks. Our evaluation shows that Layer Zero has excellent performance and scalability. On one physical host, the system overhead is comparable to that of traditional VMMs. On 16 physical hosts, Layer Zero runs 10 times faster than Hadoop and X10. On 160 physical compute servers, Layer Zero scales linearly on a typical iterative workload.

Keywords/Search Tags:

Data, Layer zero, Memory, Physical, Large, General, Processing, VOLUME

Related items

1	Research On Large Volume Data Processing Technology Based On Storage Measurement
2	Fast direct volume integral equation solvers for large-scale general electromagnetic analysis
3	The Research And Implementation On DDR2 Memory Controller With High Bandwidth And Low Latency
4	Research And DSP Realization The Frame's Processing Technology Of Physical Layer On TD-SCDMA Wireless Interface
5	The Research And Implementation On Large Data Sets Volume Rendering Based On GPU
6	A Study On Low Latency Technology Of PCIe Physical Layer
7	Research On Physical Layer Security Technology Based On Large Scale Antenna Array
8	Research On The Physical Layer Security Technology In Large Scale Antenna SWIPT System
9	Signal Processing Advances for Physical Layer Security in Communication Network
10	Design And Implementation Of Industry Specific WMN Physical Layer