Font Size: a A A

Research And Optimizations Of Distributed Storage Systems Using Machine Learning

Posted on:2024-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:K LuFull Text:PDF
GTID:1528307319463514Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Distributed storage has become one of the most widely acceptable digital infrastructure services by providing cost-effective,highly scalable and reliable platforms for storing massive data.With the rise of new application scenarios,such as Cloud computing,and the Internet of Things,the storage software and hardware environment has become increasingly complex.Heterogeneous storage architectures,diverse data types and dynamic workloads deliver higher performance requirements to modern distributed storage systems.On the other hand,machine learning technology represented by deep learning is developing rapidly.In recent years,machine learning has been applied to the optimization of various computer systems and has achieved many results,which provides a new opportunity for distributed storage performance optimization.This paper summarizes three major performance challenges in modern distributed storage systems,including data placement,metadata engine and parameter tuning,and proposes three machine learning-based distributed storage performance optimization solutions.The main contents are as follows.Traditional data placement strategies do not well consider resource differences among heterogeneous data nodes yet,leading to load imbalance and increased access latency of the system.Existing optimization efforts for heterogeneous environment are deficient in terms of data balancing,algorithm generality and resource consumption.This paper proposes RLDP,a new data placement scheme based on deep reinforcement learning in modern distributed storage systems.Firstly,RLDP models the data placement as a reinforcement learning problem,and make decisions using Deep-Q-Network(DQN)to achieve fair distribution and adaptive data migration.In addition,RLDP considers multiple system metrics that affect performance and balance in heterogeneous environment,and uses an attentional Long Short-Term Memory(LSTM)to improve the performance.Finally,RLDP addresses the challenges of training models in large state and actor space by Stagewise Training and Model Fine-tuning.RLDP is evaluated in simulated environment and real system Ceph.The results show that RLDP effectively reduces the read latency in heterogeneous environment by 50%.Moreover,RLDP improves the system read throughput by 30%~40%.Log-Structured Merge-tree(LSM-tree)based key-value storage has become one of the most common metadata storage engines for distributed storage systems.However,due to the leveled structure of the LSM-tree,the read process involves too many I/Os,resulting in poor read performance,especially range query performance.This paper proposes Trident KV,a read-optimized key-value store using learned indexes constructed by machine learning models.Firstly,Trident KV uses the learned indexes to design a new LSM-tree file index block and range filter to improve the query performance.Secondly,Trident KV designs an adaptive training algorithm to accelerate the construction of the index block,avoiding negative impacts on system performance during learned index training.In addition,the range filter achieves the lowest false positive rate with a limited memory budget by a dynamic tuning strategy.Finally,Trident KV adopts a partition-based encoding algorithm to process strings efficiently and improve the training efficiency of learned indexes.Trident KV is implemented based on Rocks DB.The evaluation results show that Trident KV achieves 7~12 times point-query performance and 4~7 times range-query performance than Rocks DB.Trident KV is exploited to store metadata in Ceph,which improves the read throughput of Ceph by 20%~60%.Modern distributed storage systems are becoming increasingly complex,generating many configurable and constrained system parameters.Faced with huge parameters and unpredictable workloads,it is difficult to find the optimal configuration manually.Existing automatic distributed storage parameter tuning algorithms may lead to local optimal results,and lack of universal workload efficiency and availability.This paper proposes ADSTS,an automatic parameter tuning system based on machine learning methods for distributed storage.Firstly,ADSTS proposes a general processing guideline to solve the parameter constraints,eliminate unconfigurable and redundant parameters and generate a standardized set of configurable parameters.Secondly,ADSTS uses Recursive Stratified Sampling and Lasso Regression to overcome the non-incremental nature of traditional sampling algorithms and efficiently identify important parameters that have the greatest impact on performance.Finally,ADSTS uses the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm to automatically find the optimal values of the important parameters.The evaluation results in real system Ceph show that ADSTS can recommend near-optimal configuration parameters under different workloads.Compared to the default parameters,ADSTS improve Ceph by 1.5 times higher in write throughput and 2.5 times higher in read throughput.
Keywords/Search Tags:Distributed storage, Machine learning, Data placement, Parameter tuning, Key-value store, Log-structured merge-tree
PDF Full Text Request
Related items