Prediction model approaches for data transfer throughput estimation and optimization

Posted on:2014-07-19

Degree:Ph.D

Type:Dissertation

University:State University of New York at Buffalo

Candidate:Kim, Jangyoung

Full Text:PDF

GTID:1458390005487156

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

All areas of science and industry have been generating increasingly complex big data at the scales of petabytes and beyond. Despite the trend of moving the application to the data rather than the data to the application, large datasets still need to be moved around for increased availability, performance, and recovery purposes. Sharing, disseminating, and analyzing these large datasets have become a big challenge, despite the deployment of petascale computing systems and optical networking speeds reaching up to 100 Gbps. The majority of users fail to obtain even a fraction of the theoretical speeds promised by these high-bandwidth networks due to issues such as sub-optimal protocol tuning, inefficient end-to-end routing, disk performance bottlenecks on the sending and/or receiving ends, and server processor limitations.;Different protocol parameters such as TCP pipelining, parallelism and concurrency levels play a significant role in the achievable network throughput. However, setting the optimal numbers for these parameters is a challenging problem, since poorly-tuned parameters can either cause underutilization of the network or they can overburden the network and degrade the performance due to increased packet loss, end-system overhead, and other factors. In this dissertation, we develop application-level models to predict the best combination of protocol parameters for optimal network performance. The tuned network parameters include the number of parallel data streams per file (for large file optimization), the level of control and data channel pipelining (for small file optimization), and the level of concurrent file transfers to fill the long fat network pipes (for all files).;We start with presenting a model to decide the optimal sampling size for data transfer optimization based on the dataset size and the estimated capacity of the network. This model helps us to generate the smallest possible sampling size with highest accuracy in any given data transfer setting. Using this sampling size model, we develop a parallel stream prediction model, called "Full C-order", for data transfer throughput optimization. Full C-order outperforms all existing parallel stream prediction models by achieving higher accuracy with much lower sampling overhead. Extending these two models, we develop a combined parameter optimization model, called "PCP", which optimizes the pipelining and concurrency parameters in addition to parallelism. Two variations of the combined PCP model are developed: i) PCP-realtime: assumes no historical data is available, and is purely based on real-time sampling; ii) PCP-historical: assumes some historical data is available, and uses both this data and some real-time sampling.;We test and evaluate our throughput optimization models on a variety of testbeds, including emulated environments (such as Emulab and CRON), on production environments (such as XSEDE, FutureGrid, and LONI), as well as in our local distributed computing system (DIDCLab) using a wide variety of dataset sizes, Round-Trip-Time (RTT), and bandwidth combinations. Our comprehensive experiments confirm the superiority of our models to the existing models in this area.

Keywords/Search Tags:

Data, Model, Optimization, Throughput, Prediction

PDF Full Text Request

Related items

1	Research The Data Of Throughput Optimization And Auditing Integrity In Cloud Storage
2	Robotic Manufacturing System Throughput Prediction Methodology Development Based on Failure Data of an Automotive Assembly
3	Research On Throughput Modeling Of CMT In Next-Generation Internet
4	Implementation Of Fault Prediction Algorithm Based On Data-driven For Analog Circuit And Software Development
5	Preconditioner-based In Situ Data Reduction for End-to-End Throughput Optimization
6	Research On The Scheduling Algorithms For Delay-throughput Optimization In WLAN
7	Research On Parameter Optimization Method Of Reservoir Computing Model
8	Fitness Calculation Time Consuming Optimization Problem Oriented Pso Prediction Strategy Research
9	Ethereum Throughput Bottleneck Analysis And Optimization Research
10	Extremal Dynamics Based MEMETIC Algorithm And Its Applications In Nonlinear Predictive Control