Font Size: a A A

Design And Analysis Of Intelligent Prefetching Algorithm For Data Deduplication

Posted on:2018-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q F MaoFull Text:PDF
GTID:2348330536457352Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of storage,the issue of persistent storage has always been one of the challenging research topics.The explosion of data amount leads to the large scale and high redundant of data storage in data centers.The redundant data may not only cost more storage space and energy consumption,but also increase dramatically the complexity of data management and storage risk.Therefore data deduplication becomes the focus of storage technology research to efficiently eliminate redundant data,reduce the burden of data storage and improve the efficiency of data storage.There are two main problems in data deduplication: 1.disk bottleneck problem caused by fingerprint index;2.chunk fragmentation significantly hurts restore performance.Therefore,in this paper we use the reinforcement learning method and pattern matching algorithm to solve the problems above.The main research contents of this paper are as follows:1)This paper proposes a fingerprint index detection and prefetching technique based on reinforcement learning.Firstly,we extract the features of segment by using the context information of the data stream.Next,we establish the mapping relation between features and segments.A high index structure is built by selecting the appropriate feedback mechanism.Then,we train the similarities of these segments to a feature represented as scores by reinforcement learning method;For an incoming segment,we trade off the segment with best feedback and unknown segments and adaptively prefetch a segment into cache by using multi-armed bandits model;Furthermore,we research the optimization of caching mechanism and design caching algorithm.Finally,we report on various simulation experiments with four dataset to verify effectiveness of our approval,our experimental results show that our method significantly reduces memory overheads and achieve effective dedupli cation.2)An algorithm based on pattern matching to optimize data restore is proposed.Firstly,we study the distribution characteristics of fragmentation after data deduplication,and analyze the data read performance of restore process.And secondly we use the idea of pattern matching to identify the local associated data blocks,compute the longest common subsequence(LCS)to form a continuous mode of disk read operations and reduce the number of random disk I/O.Then double loop buffer queue is used to design maximize the pattern matching algorithm,in order to optimize the scheduling and combination of read operations,so as to increase the performance of data restore.Furthermore,we study the optimize mechanism of cache prefetching of data restore;a nd analyze the performance of data restore under different cache granularity.Finally,we compare the performance of data restore under rewriting.In large scale experimental results show that the proposed algorithm based on pattern matching can further improve the restore performance.
Keywords/Search Tags:data deduplication, reinforcement learning, pattern matching, LCS, rewriting
PDF Full Text Request
Related items