Design And Analysis Of Intelligent Prefetching Algorithm For Data Deduplication

Posted on:2018-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Mao

Full Text:PDF

GTID:2348330536457352

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the field of storage,the issue of persistent storage has always been one of the challenging research topics.The explosion of data amount leads to the large scale and high redundant of data storage in data centers.The redundant data may not only cost more storage space and energy consumption,but also increase dramatically the complexity of data management and storage risk.Therefore data deduplication becomes the focus of storage technology research to efficiently eliminate redundant data,reduce the burden of data storage and improve the efficiency of data storage.There are two main problems in data deduplication: 1.disk bottleneck problem caused by fingerprint index;2.chunk fragmentation significantly hurts restore performance.Therefore,in this paper we use the reinforcement learning method and pattern matching algorithm to solve the problems above.The main research contents of this paper are as follows:1)This paper proposes a fingerprint index detection and prefetching technique based on reinforcement learning.Firstly,we extract the features of segment by using the context information of the data stream.Next,we establish the mapping relation between features and segments.A high index structure is built by selecting the appropriate feedback mechanism.Then,we train the similarities of these segments to a feature represented as scores by reinforcement learning method;For an incoming segment,we trade off the segment with best feedback and unknown segments and adaptively prefetch a segment into cache by using multi-armed bandits model;Furthermore,we research the optimization of caching mechanism and design caching algorithm.Finally,we report on various simulation experiments with four dataset to verify effectiveness of our approval,our experimental results show that our method significantly reduces memory overheads and achieve effective dedupli cation.2)An algorithm based on pattern matching to optimize data restore is proposed.Firstly,we study the distribution characteristics of fragmentation after data deduplication,and analyze the data read performance of restore process.And secondly we use the idea of pattern matching to identify the local associated data blocks,compute the longest common subsequence(LCS)to form a continuous mode of disk read operations and reduce the number of random disk I/O.Then double loop buffer queue is used to design maximize the pattern matching algorithm,in order to optimize the scheduling and combination of read operations,so as to increase the performance of data restore.Furthermore,we study the optimize mechanism of cache prefetching of data restore;a nd analyze the performance of data restore under different cache granularity.Finally,we compare the performance of data restore under rewriting.In large scale experimental results show that the proposed algorithm based on pattern matching can further improve the restore performance.

Keywords/Search Tags:

data deduplication, reinforcement learning, pattern matching, LCS, rewriting

PDF Full Text Request

Related items

1	Research On Building Efficient Data Deduplication Storage Systems For Data Backup
2	Research On Performance Optimization Based On Container Characteristics In Deduplication-based Backup Systems
3	The Research Of Rewriting XML Queries Based On Materialized Views
4	Research Of Data Deduplication In Data Disaster Tolerance Systems
5	Research On Duplicate Data Detection In Data Deduplication
6	Research On Key Technologies Of Data Deduplication For Backup System
7	Study On Data Deduplication Technique For Data Backup Systems
8	HTDRDedu:The Design And Implementation Of A Distributed Backup Data Deduplication System
9	Research On Privacy Preserving Deduplication And Computation For Outsourced Data
10	Study On Deduplication Supporting Fuzzy Matching For Encrypted Data In Cloud Storage