Font Size: a A A

Reliability models for HPC applications and a Cloud economic model

Posted on:2013-07-13Degree:Ph.DType:Dissertation
University:Louisiana Tech UniversityCandidate:Thanakornworakij, ThanadechFull Text:PDF
GTID:1452390008980504Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the enormous number of computing resources in HPC and Cloud systems, failures become a major concern. Therefore, failure behaviors such as reliability, failure rate, and mean time to failure need to be understood to manage such a large system efficiently.;This dissertation makes three major contributions in HPC and Cloud studies. First, a reliability model with correlated failures in a k-node system for HPC applications is studied. This model is extended to improve accuracy by accounting for failure correlation. Marshall-Olkin Multivariate Weibull distribution is improved by excess life, conditional Weibull, to better estimate system reliability. Also, the univariate method is proposed for estimating Marshall-Olkin Multivariate Weibull parameters of a system composed of a large number of nodes. Then, failure rate, and mean time to failure are derived. The model is validated by using log data from Blue Gene/L system at LLNL. Results show that when failures of nodes in the system have correlation, the system becomes less reliable.;Secondly, a reliability model of Cloud computing is proposed. The reliability model and mean time to failure and failure rate are estimated based on a system of k nodes and s virtual machines under four scenarios: 1) Hardware components fail independently, and software components fail independently; 2) software components fail independently, and hardware components are correlated in failure; 3) correlated software failure and independent hardware failure; and 4) dependent software and hardware failure. Results show that if the failure of the nodes and/or software in the system possesses a degree of dependency, the system becomes less reliable. Also, an increase in the number of computing components decreases the reliability of the system.;Finally, an economic model for a Cloud service provider is proposed. This economic model aims at maximizing profit based on the right pricing and rightsizing in the Cloud data center. Total cost is a key element in the model and it is analyzed by considering the Total Cost of Ownership (TCO) of the Cloud.
Keywords/Search Tags:Cloud, Model, HPC, Failure, System, Reliability, Components fail independently, Economic
PDF Full Text Request
Related items