Big data system infrastructure at extreme scales

Posted on:2016-10-28

Degree:Ph.D

Type:Dissertation

University:Illinois Institute of Technology

Candidate:Zhao, Dongfang

Full Text:PDF

GTID:1478390017480760

Subject:Computer Science

Abstract/Summary:

Rapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of huge collections of data --- dubbed as Big Data. This data has the potential for enabling new insights that can change the way business, science, and governments deliver services to their consumers and can impact society as a whole. This has led to the emergence of the Big Data Computing paradigm focusing on sensing, collection, storage, management and analysis of data from variety of sources to enable new value and insights. To realize the full potential of Big Data Computing, we need to address several challenges and develop suitable conceptual and technological solutions for dealing them. Today's and tomorrow's extreme-scale computing systems, such as the world's fastest supercomputers, are generating orders of magnitude more data by a variety of scientific computing applications from all disciplines. This dissertation addresses several big data challenges at extreme scales. First, we quantitatively studied through simulations the predicted performance of existing systems at future scales (for example, exascale 10.;18 flops). Simulation results suggestedthat current systems would likely fail to deliver the needed performance at exascale. Then, we proposed a new system architecture and implemented a prototype that was evaluated on tens of thousands nodes on par with the scale of today's largest supercomputers. Micro benchmarks and real-world applications demonstrated the effectiveness of the proposed architecture: the prototype achieved up to two orders of magnitude higher data movement rate than existing approaches. Moreover, the system prototype was incorporated with features that were not well supported in conventional systems, such as distributed metadata management, distributed caching, lightweight provenance, transparent compression, acceleration through GPU encoding, and parallel serialization. Towards exploring the proposed architecture at millions of node scales, simulations were conducted and evaluated with a variety of workloads, showing near linear scalability and orders of magnitude better performance than today's state-of-the-art storage systems.

Keywords/Search Tags:

Big data, System, Storage, Scales

Related items

1	Vehicle Dynamic And Static Weighing System Based On The Truck Scales
2	Study On Data Transmission And Storage Of HLJTV Nonlinear Editing HD Network
3	Design And Implementation Volume-Based Hierarchical Storage System
4	Design And Implementation Of Security Data Storage System Based On Cloud Storage
5	Research Of Data Self-destruct Based On Distributed Object Storage System
6	Research On Key Techniques Of Distributed Data Processing And Storage
7	Dynamics Of Multi-Agent Systems On Time Scales
8	Research On Data Replication Technology Based On HDFS Storage System
9	Research On Optimization Of Persistent Key-value Storage System Based On SSD-NVM
10	Research On Massive Data Storage By Blu-ray & HDD Integrated Based On Blu-ray Technology