Research And Implementation Of Duplicate Data Management Technology Based On FastDFS

Posted on:2015-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2308330473451791

Subject:Information security

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of computer technology, there is an explosive growth of digital information such that, especially in the cloud storage system, the amount of data even achieves the scale of PB level. Facing such a huge amount of data, the study of how to effectively find and eliminate duplicated data in the system becomes particularly important.Data chunking algorithm can quickly and efficiently detect the duplicated data among files. It is the core technology of the detection of same data. Addressing the problem that the chunking boundary of existing data chunking algorithms is uncertain which causes the data block to be too large and easy to produce data fragmentation, based on the principle of reducing the hard block in the system and balancing the contradiction of increasing the de-duplication rate and reducing the time consumption of data chunking algorithms, this thesis proposes a sliding window chunking method based on pre-chunking SWCDC. SWCDC employs a larger expected block value for chunking the region of the file whose contents didn’t change, while for the other region of the file it uses a smaller expected block value. By distinguishing the data from change region and from non-change region, SWCDC especially suits for the data duplication management systems which have much duplicated data. In addition, on the basis of SWCDC, in order to reduce the metadata overhead of the data block, this thesis proposes a sliding window chunking method based on merger ISWFDC. Experimental results show that, SWCDC and ISWFDC algorithms can achieve higher deduplication performance than conventional data chunking algorithms.To address the problem that the existing bloom filter is too slow when checking the large data block fingerprint set, and it cannot be well adapted to the dynamic growth of the data block fingerprint set in the cloud storage environment, this thesis proposes a dynamic bloom filter matrix set DBFMS. By means of representing the data block fingerprint set as individual matrixes which are constructed by bits, rather than individual bloom filter strings which are constructed by bits, so that the efficiency of retrieving duplicated data block fingerprint has been significantly improved. Theoretical analysis, simulation tests and experiments show that comparing with traditional static bloom filter and dynamic bloom filters, DBFMS have made good improvements including scalability, query efficiency and false positive probability.Finally, combining the duplicate data management theory, its system architecture model and the improved algorithms, duplicate data management platform which is based on FastDFS is implemented in the thesis by using the open source distributed file system FastDFS and configuring FastDFS cluster. The system complete the file upload, download, delete, rename and duplicated data management functions. By comparing the performances between the systems that applying the improved algorithm with the previous one, the experimental results show that the former one has better performance, higher efficiency, and therefore it is more suitable for cloud storage environment.

Keywords/Search Tags:

duplication data management, same data detection, bloom filter, data chunking algorithm

PDF Full Text Request

Related items

1	Research And Application Of De-duplication Algorithm Based On Double Bloom Filter
2	Research And Application Of Data Deduplication Technology Based On Bloom Filter
3	Research And Implementation Of Data De-duplication Technology In Virtual Tape Library
4	Research And Application Of URL De-gravity Algorithm Based On Bloom Filter Algorithm
5	Study On Duplication Detection Of Data Streams
6	Researches And Applications On Efficient Bloom Filter For Big Data
7	Research Of Data De-duplication Based On Mobile Terminals
8	Design And Implementation Of A Backup System Based On Data De-Duplication
9	Domain-independent de-duplication in data warehouse cleaning
10	Research On Similarity-based Secure Data Deduplication In Cloud Computing