| With the explosive growth of the global data volume,enterprises back up data at regular intervals to guard important data assets.Once the system crashed,enterprises are able to exploit backuped data to restore original data.However,a large amount of redundant data exists in periodic backups,which consumes a large amount of storage space.Data deduplication and delta compression as two key technologies of redundancy elimination,have been widely used in backup systems.However,the application of these technologies will cause the chunks of a backup version to be scattered over many storage devices,which causes a lot of random reads during restores,severely decreasing the restore performance.Through observations,the restore performance will be affected by the following three aspects:(1)unefficient rewriting algorithms for package-dominated workloads during backups;(2)low performance of cache algorithms with single-grained replacement during restores;(3)the weak locality of backup stream caused by current reliability mechanisms that exploit replication or erasure codes for different characteristics of chunks.In this dissertation,for improving the restore performance,the following three aspects of researches are carried out on rewriting algorithms,cache algorithms and data storage redundancy policies.In package-dominated backup workloads,a large amount of identified fragmented chunks tend to remain fragmented in subsequent backups and thus rewritten repeatedly.To balance deduplication ratio with restore performance,existing rewriting algorithms only rewrite limited fragmentation,which severely impacts the restore performance.To solve this problem,this dissertation proposes a persistent fragmentation chunk grouping method,called PFCG.Different from traditional rewriting algorithms that store identified fragmentation and other chunks in the same type of containers,PFCG classifies containers into two different types of containers(fragmented container and regular container).During backups,PFCG identifies rewritten repeatedly chunks(persistent fragmented chunks)by collecting fragmentation information,and then stores persistent fragmented chunks and other chunks in the two types of containers respectively,which increases the percentage of useful chunks in an accessed container during restores,improving the restore performance.Experimental results show that PFCG improved the restore performance by 21%-47%compared to state-of-the-art rewriting algorithms without sacrificing deduplication efficiency.In the erasure-coded deduplicated systems,when the node failure causes the system to be in a degraded mode,the access of an unavailable object will introduce extra accesses to other objects from the same stripe,which interferes with the normal operation of existing cache algorithms.Meanwhile,existing restore cache algorithms do not distinguish the type of objects,which has difficulty in tackling efficiently for different types of failed objects in a degraded read mode,further causing the restore performance degradation.To solve this problem,this dissertation proposes an object type aware restore cache scheme,called OAD.OAD identifies the type of objects in which the restored chunks are located,objects where duplicate chunks belong are managed at the fine-grained chunk caching method,which optimizes cache space efficiency;objects where non-duplicate chunks belong are managed at the coarse-grained object caching method,which achieves efficient locality.By exploiting reasonable caching methods for different types of objects in a degraded read mode,OAD improves the overall restore performance.Experimental results show that compared with state-of-the-art cache algorithms,when nodes failed,OAD significantly improves the restore performance by 20%-65%.In deduplicated and delta compression backup systems,existing reliability methods exploit replication or erasure codes for different characteristics of chunks,which causes a large number of logically inconsecutive chunks to be stored in the same container,destroying the data locality and weakening the cache locality,thus degrading the restore performance.To solve this problem,this dissertation proposes a data storage redundancy scheme for delta compression,called Rep EC~+.According to the characteristics of data chunks in deduplicated and delta compression backup systems,a delta utilization aware hybrid redundancy scheme(Rep EC-Core)is designed.Rep EC-Core uses replication to protect containers with high proportion of delta compressed chunks and deploys erasure codes to protect non-replicated containers respectively for improving the data locality.Besides,base chunks of data chunks become fragmented and remain fragmented in subsequent backups after the introduction of delta compression.Based on this observation,a base fragmentation elimination method based on the information among backup versions(Rep EC-HDS)is designed.Rep EC-HDS exploits the information among backup versions to accurately identify the base fragmentation caused by the base chunk of a similar chunk,and cancels delta compression between them to strength data locality.Experimental results show that compared with existing solutions,Rep EC~+significantly improves the restore performance by 26%-59%. |