Font Size: a A A

Research On Archival And Write-update Performance Optimization Technology For Tiered Erasure-coded Storage Clusters

Posted on:2023-08-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:B XuFull Text:PDF
GTID:1528307043967949Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The modern large-scale storage systems usually adopt a tiered erasure-coded cluster architecture.A small amount of hot data is stored in the first-level performance storage tier with the replication or high-performance erasure coding mechanism;a large amount of warm or cold data is periodically archived into the second-level capacity storage tier with a high space-efficiency erasure coding mechanism.The tiered erasure-coded cluster architecture automatically balances system performance,reliability,and storage cost well.However,tiered erasure-coded clusters also introduce maintenance operations with high data processing intensity,including normal write-update,degraded read and write,failure reconstruction,and inter-tier erasure-coded archival,which will seriously affect the actual performance of front-end applications.Among them,erasure-coded write-update and archival are the two most frequently occurring key maintenance operations.The former maintains the same encoding version of the data and parity blocks in an erasure-coded stripe when the user issues normal write requests;the latter periodically archives data with decreased access frequency from the first performance storage tier to the second capacity storage tier,improving the overall storage space utilization efficiency of the cluster.However,both of these operations will cause a large amount of data flow between nodes,which will seriously affect the performance of the front-end application.This thesis considers the network heterogeneity and dynamic changes of actual bandwidths among nodes or racks in the tiered erasure-coded storage cluster,and make full use of the hardware characteristic of nodes and network to improve the performance of erasure-coded writeupdate and archival operations,thereby effectively strengthening the overall performance and reliability of large-scale erasure-coded clusters.The main research contents and innovations of this thesis are as follows:Aiming at the limited network bandwidth between racks in the cluster,a rack-aware erasure-coded archival optimization mechanism is proposed.Cross-rack traffic mainly comes from the redistribution of data or parity blocks in a large number of stripes between nodes in different racks during archival operations.Therefore,an encoding-oriented replica placement strategy named ERP is firstly proposed,which ensures minimal overlap of static data blocks between racks and dynamic rack-level load balancing,while maintaining rack and node-level fault tolerance;then,based on ERP,the traffic-efficient erasure-coded archival mechanism named TEA is designed.TEA considers the topology of rack layout,and the encoding node preferentially retrieves data blocks in the same rack;finally,the spatial and temporal locality are combined with the stripe organization algorithm.TEA-SL(TEA-Spatial Locality)constructs stripes according to the order of nodes where data is located to further reduce the intra-rack traffic;TEA-TL(TEA-Temporal Locality)constructs stripes according to the data write order to speed up the reuse of storage space.Experimental results show that TEA effectively reduces cross-rack traffic,improves archival throughput by 70.8%,and rack-level load balancing by 1.45 times.Aiming at the difference and variability of available bandwidth while the cluster network is running,a network heterogeneity aware erasure-coded archival pipeline scheduling mechanism is proposed.Existing pipelined archival schemes usually assume that the network bandwidth between nodes is the same and fixed,resulting in the simultaneous occurrence of local link congestion or idleness during the operation.Therefore,a pipelined erasure-coded archival scheduling mechanism named Archpipe is proposed.Archpipe first assigns different scheduling priorities to network links according to their available bandwidths,and considers the locality of data processing,so as to construct an optimal single erasure-coded archival pipeline;then,Archpipe makes full use of idle node resources to construct multiple pipelines,so that multiple parity blocks are encoded in parallel;finally,Archpipe is implemented as a common plug-in,avoiding the coupling of specific replica placement strategies and stripe organization algorithms.Experimental results show that Archpipe can be seamlessly integrated into multiple popular distributed storage systems,and the archival throughput is increased by at least 3.6 and 1.3 times in disk and memory scenarios,respectively.Aiming at the transactionality and locality of redundancy synchronization in the erasure-coded write-update operation,a redundancy synchronization mechanism combining batch processing and delayed update is proposed.The overhead in the erasure-coded writeupdate mainly comes from redundancy synchronization.Even if the high-performance network bandwidth is sufficient,in the case of the data and parity blocks in the stripe with separated locations,maintaining the version consistency of the two still leads to multiple rounds of network communications and encoding latency.Therefore,an RDMA(Remote Direct Memory Access)based write-update redundancy synchronization mechanism named F-Write for in-memory erasure-coded clusters is proposed.F-Write first uses the one-sided verbs provided by RDMA to formulate the version consistency protocol named Fast2 PC,which simplifies the operations of transaction logs,and merges multiple synchronized data for batch submission;then F-Write uses the delayed update strategy for parity blocks,when the parity blocks need to be used,it submits all undo transactions;finally,for the multiple write-updates of data blocks,using the linear feature of erasure coding,only the original and latest versions of the data blocks are merged,and the parity blocks are synchronized in the background.Experimental results show that,under update-intensive workloads,F-Write reduces write latency by 61%,improves system throughput by 2.6 times,and does not affect data recovery performance.
Keywords/Search Tags:Distributed storage cluster, Erasure coding, Maintenance operation, Erasurecoded archival, Write optimization, Traffic control
PDF Full Text Request
Related items