| In order to improve reliability,many deduplication backup systems apply erasure coding after eliminating data duplication,and the recently proposed inner-object coding has higher degraded read performance and lower storage overhead.However,the existing innerobject coding on deduplication system have poor scalability,which cannot meet the demand for frequent scaling of clusters as data storage and business complexity increase in cloud storage.Based on this problem,an optimized storage scaling scheme for deduplication backup systems was proposed,called OEC-dedup.OEC-dedup will pre-divide the containers into groups according to the localities between the containers,and reorganize the data blocks into new coding stripes according to their localities while scaling,and update the parity blocks.We have also implemented a scalable deduplication backup prototype system OECDedup based on inner-object coding.The prototype system implements basic functions such as deduplication,encoding,and degraded reading in the backup system,as well as optimized during the inner-object scaling process.At the same time,the deduplication fragment recovery and prefetch strategies applied during the scaling process can effectively improve the read and write performance and improve throughput.Experimental results show that when cluster expansion occurs,if the encoded stripe is short after scaling,the cluster scaling efficiency of OEC-dedup is greatly increased compared with the traditional inter-object coding expansion efficiency,reaching 71.1%.While the expansion of long encoding stripe occurs,the expansion efficiency increased slightly,reaching 35.3%.Compared with the inter-object coding expansion,the expansion efficiency decreases slightly due to the increase in container correlation matching,and the reduction is between 13.3% and 29.0%.Performance tests on degraded reads and nodes recovery show that while OEC-dedup improves cluster expansion performance,it ensures the efficiency of degraded reads and node recovery of the system,while inter-object coding has lower storage Overhead. |