Font Size: a A A

Performance Optimization Of Data Deduplication In Backup Systems

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:C TanFull Text:PDF
GTID:2428330590492297Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the digital information era,we must face data loss problem when enjoying the benefits of digital information.Data backup technology plays a crucial role in data protection,and it can recovery data in a short time when data losts.Data deduplication can eliminate duplicated data to reduce storage costs,which is widely used in the backup systems.However,existing duplicated data detection algorithms based on chunk level deduplication systems suffer from insufficient performance.Fixed Sized Partition(FSP)algorithm does not consider the characteristics of the data stream when selecting the chunk size.Therefore,FSP algorithm is unable to balance the relationship between deduplication rate and metadata which can affect the system's read and write performance.Content Defined Chunking(CDC)algorithm will appear a pathological phenomenon that the block can not be divided due to unmatched boundary conditions,affecting the system's deduplication rate.The performance of cache algorithm used in data recovery is also insufficient in data deduplication system.When the fragmentation rate of backup data stream is high,the performance of Least Recently Used(LRU)algorithm will be inadequate.Forward Assembly Area(ASM)algorithm can effectively deal with data fragmentation problem based one container principle.However,ASM algorithm does not consider the locality of data chunks and buffers the container,increasing the overhead of reading from disk.To address the problems in data chunking algorithms,we propose an extended FSP algorithm and an extended CDC algorithm.The extended FSP algorithm is an extenison of FSP algorithm and introduces the characteristics of the CRC code to identify the data stream.Meanwhile,the relationship between deduplication ratio and metadata is balanced.Experimental results show that the extended FSP algorithm For different backup data stream can obtain a better chunk size.The extended CDC algorithm is an extenison of CDC algorithm,and it limits the upper and lower bounds of chunk size,while utilizing a Magic Number table to improve the deduplication rate.Experimental results show that compared with traditional CDC algorithm,extended CDC algorithm can achieve an average 12% improvement of deduplication rate.In this paper,we also present a Dynamic Adaptive Forward Assembly Area Method,called DASM,to accelerate restore speed for deduplication based backup systems.DASM exploits the fragmentation information within the restored backup streams and dynamically trades off between chunk-level cache and container-level cache.Experimental results show that DASM improves the restore speed of traditional LRU and ASM methods by up to 58.9% and 57.1%,respectively.
Keywords/Search Tags:Data Backup, Data Deduplication, Chunking Algorithm, Cache Algorithm, Performance Evaluation
PDF Full Text Request
Related items