Research On Erasure Code For Cross-Data Center Disaster-Tolerant Storage

Posted on:2022-06-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Bao

Full Text:PDF

GTID:1528307169977649

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently,data loss due to data center failures occurs frequently.Therefore,crossdata center disaster-tolerant storage has received widespread attention.Compared with the replication-based cross-data center disaster-tolerant storage,the erasure-coded crossdata center disaster-tolerant storage has higher reliability and less redundancy,so it has become a research hotspot.However,erasure code technology for cross-data center disaster-tolerant storage is facing the following challenge.(1)The existing work cannot construct erasure code with small cross-data center repair traffic under different deployment environments and encoding parameters.Because the larger the cross-data center repair traffic,the longer the data transmission time when repairing data,and the lower the repair efficiency,existing work cannot achieve high repair efficiency while meeting the diverse needs of users.(2)In cross-data center environments,the existing erasure code writing method has to forward data many times and generate a large crossdata center writing traffic,and its bandwidth utilization is low.Each time the data is forwarded,the storage system needs to take time to write and read the memory(or disk).So,the more the data forwarding,the lower the writing efficiency.What’s more,the larger the cross-data center writing traffic and the lower the bandwidth utilization,the longer the data transmission time when writing data,and the lower the writing efficiency.Therefore,existing work cannot achieve high writing efficiency.(3)The existing erasure code update method cannot adjust the size of coded stripes flexibly.So,when it updates data,it needs to update many coded stripes and the amount of data to be updated in each coded stripe is large.Moreover,the existing erasure code update method does not optimize the transmission topology for the cross-data center environment.Hence,its cross-data center update traffic is large.As a result,the exiting work has to take a long time for data transmission when update data,so that its update efficiency is low.To improve the repair efficiency,writing efficiency,and update efficiency of the erasure code for cross-data center disaster-tolerant storage,this dissertation researches the erasure code construction method,erasure code writing method,and the erasure code update method.This dissertation’s main contributions are as follows.The existing erasure code construction method cannot construct erasure code with small cross-data center repair traffic under different deployment environments and coding parameters,so it cannot meet the diverse needs of users while ensuring repair efficiency.Hence,this dissertation proposes an optimal code construction method based on parallel cascade verification,called OCM-PCV.Under different deployment environments and coding parameters,OCM-PCV can construct the optimal code that can achieve the corresponding average cross-data center repair traffic’s lower bound.Specifically,OCMPCV first uses a cascaded verification algorithm to verify whether the erasure code repair group distribution scheme satisfies the specified encoding parameters.Then,OCM-PCV uses a parallel search algorithm to find the one with the smallest average cross-data center repair traffic from all erasure code repair group distribution schemes that satisfy the specified encoding parameters.Finally,OCM-PCV uses a trial-and-error-based repair group distribution scheme transformation algorithm to transform the found erasure code repair group distribution scheme into the optimal code’s generator matrix.Experiments on UCloud’s 6 data centers distributed in 5 cities show that compared with the existing work,OCM-PCV can reduce the average cross-data center repair traffic by 43.9%～56.8%and shorten the average repair time by 37.1%～50.8%.OCM-PCV can obtain the theoretically optimal code under different coding parameters and deployment environments,but it takes a long time to verify the erasure code repair group distribution scheme under some encoding parameters,which makes the optimal code construction time long.Therefore,OCM-PCV is mainly for scenarios that are not sensitive to erasure code construction time.For scenarios that are sensitive to erasure code construction time,this dissertation proposes an approximate optimal code construction method based on the active incremental support vector machine(SVM),called AOCM-SVM,which can obtain the approximate optimal code quickly under different deployment environments and encoding parameters.Specifically,AOCM-SVM can convert each erasure code repair group distribution scheme and the specified encoding parameters into a fixed-length feature vector that is easy to classify quickly,and convert the problem of verifying whether the erasure code repair group distribution scheme satisfies the encoding parameters into the problem of fixed-length feature vector’s classification.Then,AOCM-SVM uses an active incremental SVM to verify erasure code repair group distribution schemes quickly by classifying corresponding fixed-length feature vectors.Moreover,AOCM-SVM uses a parallel search algorithm to select the one with the smallest average cross-data center repair traffic(i.e.,approximately optimal repair group distribution scheme)from all erasure code repair group distribution schemes that can pass the verification of the active incremental SVM.Finally,AOCM-SVM can convert the approximate optimal repair group distribution scheme into an approximate optimal code’s generator matrix.Experiments on UCloud’s 6 data centers distributed in 5 cities show that the time consumed by AOCM-SVM to construct the approximate optimal code is only 12.1%of the time consumed by OCM-PCV to construct the optimal code,and the average cross-data center repair traffic of AOCM-SVM equals that of OCM-PCV under the most encoding parameters.The existing erasure code writing method cannot reduce the number of data forwarding and the cross-data center writing traffic sufficiently,and its bandwidth utilization is low so that its writing efficiency is low.Hence,this dissertation proposes a cross-data center erasure code writing method based on generator matrix transformation,called CREW,which can reduce the number of data forwarding and the cross-data center writing traffic and improve the bandwidth utilization.Specifically,CREW uses a starstructured topology to organize data transmission in each data center,thereby being able to reduce the number of data forwarding.Moreover,CREW uses a top-down bandwidthdecreasing tree-structured topology to organize data transmission between data centers and transforms the generator matrix according to the topology to make(1)the generation of coded blocks located in different data centers of the client need as little data as possible,which can reduce the cross-data center write traffic,and(2)the generation of coded blocks located in the data center at bottom of the tree-structured topology need as little data as possible,which can make the data transferred in the low-bandwidth link is as little as possible,so that can achieve a high bandwidth utilization.In addition,since CREW’s transformation of the generator matrix will not change the linear relationship between the coded blocks of the erasure code,it will not affect the repair performance of the erasure code.Experiments on UCloud’s 6 data centers distributed in 5 cities show that compared with existing work,CREW can shorten the average write time by 30.6%～44.2%The existing erasure code update method cannot adjust the size of the coded stripe and does not optimize the transmission topology for the cross-data center environment.Therefore,its cross-data center update traffic is large,so that its update efficiency is low.Hence,this dissertation proposes a cross-data center erasure code incremental update method based on elastic stripe,called ESDU,which can adjust the size of the coded stripe flexibly and optimize the transmission topology to reduce cross-data center update traffic.Specifically,ESDU first locates duplicate data packets between old data and new data based on checksum technology.Then,according to the result of duplicate data packet locating,ESDU uses a non-overlapping tree-based algorithm to obtain the new coded stripe dividing scheme(i.e.,coded stripe size adjustment scheme)that can minimize the amount of data to be updated.So,ESDU can reduce the cross-data center update traffic.Moreover,by analyzing the linear relationship and location information of the coded blocks comprehensively,ESDU can construct a topology with small cross-data center traffic to organize data transmission,so it can reduce the cross-data center update traffic further.Experiments on UCloud’s 3 data centers distributed in 3 cities show that compared with the existing update method based on fixed strips,ESDU can reduce the average cross-data center update traffic by 81.6%and shorten the average update time by 74.6%.To verify the above research results further,this dissertation designs and implements an erasure-coded cross-data center disaster-tolerant storage system,called ECCDC,which can efficiently write,read,repair,and update files with different sizes and numbers.Specifically,ECCDC contains an erasure code construction module,an append-style incremental snapshot module,a file read module,a file writing module,a file repair module,and a file update module.During initialization,the ECCDC’s erasure code construction module will construct the optimal code or the approximate optimal code corresponding to the deployment environment and the encoding parameters specified by the user.After the initialization,for ordinary files that are not mass small files,ECCDC reads,writes,repairs,and updates them directly.For mass small files,the system needs to traverse a large number of files when writing,reading,updating,and repairing them directly,so the efficiency is very low.Hence,ECCDC first uses its append-style incremental snapshot module to generate a full snapshot or incremental snapshot of the logical volume where these small files are located,and export these snapshots as ordinary files.Then,ECCDC reads,writes,repairs,and updates these snapshot files through its file read module,file writing module,file repair module,and file update module.Since there is no need to traverse these small files in the above processes,the efficiency is high.EduCoder is a well-known online practice teaching platform in China and has served more than 3,000 universities.We deployed ECCDC in EduCoder’s 3 data centers and provided them with comprehensive data backup services.Experiments in the actual application environment show that compared with Microsoft WAS,Facebook HDFSRAID,and Hadoop 3.0’s HDFS,ECCDC shortens the average writing time,the average read time,the average single node repair time,the average single data center repair time,and the average update time by 38.3%～39.5%,12.7%～13.8%,50.1%～66.1%,19.2%～20.5%,and 74.9%～75.7%,respectively.

Keywords/Search Tags:

Cross-Data Center, Disaster-Tolerant Storage, Erasure Code, Data Repair Efficiency, Data Writing Efficiency, Data Update Efficiency

PDF Full Text Request

Related items

1	Research On Distributed Fault-Tolerant Storage Technology Based On Erasure Code
2	Research On Erasure Code-Based Data Fault-Tolerant Technology For Cloud Storage
3	Research On Data Repair Techniques In Erasure-Coded Storage Systems
4	Research On Data Writing Performance Optimization In Erasure Code Fault-Tolerant Storage System
5	Optimizing Data Repair And Update For Erasure-Coded Systems With XOR-Based In-Network Computation
6	Research On High Availability And Energy Efficiency Construction Of Bank Data Center
7	Research On Erasure Code-based Data Fault-tolerant Technology For Cloud Storage
8	Research On Backup And Repair Technologies Based On Erasure Codes In Distributed Storage Systems
9	Research Of The Efficiency Improvement Method Of DC Power Supply System In Data Center
10	Research On Multi Stripe Repair Of Erasure Code In Distributed Storage System