Font Size: a A A

Research On Performance Modeling And Application Of Distributed File System

Posted on:2012-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:T Z ZhaoFull Text:PDF
GTID:1488303356992729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Distributed file system can effectively solve the problems of the mass data storage and I/O bottlenecks in distributed system, and become the research hotspot of storage industry and academia. Distributed file system is the key part of any complete massively distributed computing environment and the performance of distributed file system directly affects the efficiency of the whole distributed computing environment. Therefore, the performance research of distributed file system is the key and difficulty of the research on distributed file system. However, the performance research of distributed file system keeps a lot of problems in many aspects, such as performance evaluation, performance modeling, performance prediction, performance optimization. In order to solve these problems, this dissertation systematically studies some key technology problems of performance research on distributed file system, including performance factors and distribution of distributed file system, the framework of performance evaluation, performance prediction models and performance optimization etc. The main research results and innovation can be concluded as follows:(1)After making a systematic study of a large number of distributed file system architecture and performance factors, this dissertation presents a typical performance factors distribution framework of distributed file system. In this work, the performance factors of distributed file system can be divided into four parts: metadata server related factors, data storage server related factors, client/application related factors and network related factors. Then, this dissertation makes a quantitative and qualitative analysis of some key performance factors as the basic of the performance research on distributed file system. On this basis, the performance evaluation framework of distributed file system is proposed to systematically study the feasible performance evaluation schemas of distributed file system. Then, this dissertation evaluates and analyzes the potential performance characteristics of some key performance factors, which provide a reference for distributed file system researchers.(2)A performance prediction model based on machine learning is proposed. After studying the architecture and performance factors of classic file systems, this dissertation designs a prediction model of distributed file system based on machine learning approaches (MLPPModel). We use feature selection algorithms to reduce the number of performance factors to be tested in validating the performance. We also mine the particular relationship of system performance and performance factors to predict the performance on a specific distributed file system. We validate and predict the performance of a specific Lustre file system through designing a series of experiment cases. Our evaluation and experiment results indicate that threads/OST, num of OSSs (Object Storage Server), num of disks and num and type of RAID are the four most important parameters to tune the performance of Lustre file system. The average relative errors of predictive results can be controlled within 23.3%-25.6%, which shows the better prediction accuracy.(3)A performance prediction method based on the relative performance prediction model is proposed. After doing a survey on performance factors, we conduct a series of performance evaluations via experimental approaches and propose a performance relational model (PRModel). In the experimental and PRModel analysis, we discover that different performance factors have closed performance correlations. In order to mine the relational information, we propose a novel relative performance predictive model (RPPModel). This model can be used to predict the overhead under different performance factors. We validate the model through a series of experiments under a variety of performance factors. Our experimental results show that the average relative errors of prediction results can be controlled within 17.1%-27.9%. This model is easy to use and can obtain better prediction accurate.(4)A parallel strategy based write optimization schema of HDFS file system is proposed, and this dissertation applys performance prediction model based on machine learning approaches and relative performance prediction model to predict and analyze the performance of the improved HDFS file system. After designing the experiment platform based on Hadoop over HDFS and Hadoop over Lustre, this dissertation systematically evaluates the performance of HDFS and Lustre under the application scenario of research engine. In the experiment, this work discovers that HDFS can’t effectively dispose the write performance. Then, we present a parallel strategy based write optimization schema of HDFS file system to optimize the write performance of HDFS file system. Our experimental results show that the improved HDFS can effectively improve the write performance. We also apply performance prediction model MLPPModel and relative performance prediction model RPPModel to predict and analyze the performance of the improved HDFS file system. Our prediction results indicate that the average relative errors of prediction results of performance prediction model MLPPModel can be controlled within 1.45%-18.17% and the average relative errors of prediction results of relative performance prediction model RPPModel can be controlled within 1.28%-19.05%, which shows the better prediction accuracy. The parallel strategy based write optimization schema of HDFS file system, performance prediction model based on machine learning approaches and relative performance prediction model are of significance to guide the design of improved distributed file systems.
Keywords/Search Tags:Distributed file system, Parallel I/O, Performance evaluation, File system modeling, Performance prediction, I/O performance optimization
PDF Full Text Request
Related items