Font Size: a A A

Research On Small Files Storage Performance Optimization Based On Time Series Prediction

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:A L ZhangFull Text:PDF
GTID:2480306104487984Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Massive small files have a great impact on the read and write performance of the distributed storage system.In distributed storage systems,aggregation and cache prefetching are mainly used to optimize the read and write performance of small files,but the following problems still exist:(1)The existing aggregation mechanism of small files does not fully consider the temporal characteristics of the load sequence of small files,resulting in a low file correlation in the aggregation block.(2)The fixed aggregate block size cannot adapt to the variable file size range,resulting in a decrease in the write performance of small files.(3)The existing cache replacement algorithm does not comprehensively consider the file access time,access frequency and cache value,resulting in a low cache hit ratio when reading small files and reducing the read performance of small files.In order to solve the above problems,a small file storage performance optimization scheme based on time series prediction(TSP-SFSPO)has been proposed.TSP-SFSPO includes load analysis and prediction module,dynamic queue building module and storage module.In order to solve the problem of low file correlation in the aggregation block,the load analysis and prediction module uses ARIMA-LSTM hybrid model to predict the change trend of file size in the load sequence of small files,classifies big and small files according to the change trend,and gives the range of the file size.In order to solve the write performance degradation caused by static aggregation block,the dynamic queue building module uses analytic hierarchy process(AHP)to set different merging thresholds for different range of small files.The storage module combines the results of load prediction and the merging threshold of the dynamic queue building module to implement the dynamic small file aggregation mechanism based on time series prediction.Additionally,in order to solve the problem of poor read performance of small files caused by file aggregation operation,the LRU?FW cache replacement algorithm is implemented based on LRU,which can effectively improve the read performance of small files.TSP-SFSPO is built on the Ceph file system.The experimental results show that the ARIMA-LSTM hybrid model can predict the change trend of load sequence more accurately than ARIMA model and LSTM model.Compared with native Ceph system and SFPS scheme,TSP-SFSPO scheme reduces the write time of small files by up to 90.7% and 13.1%,the read time by up to 75.2% and 18.6%,and the memory usage by up to 80% and 15.7%,respectively.Therefore,in the face of massive small files,TSP-SFSPO scheme can significantly improve the read and write performance of Ceph system.
Keywords/Search Tags:Small Files Storage, Time Series Prediction, Dynamic Aggregation, Analytic Hierarchy Process, Ceph File System
PDF Full Text Request
Related items