Font Size: a A A

Research And Implementation Of On-line De-duplication Technology

Posted on:2012-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:P P HuFull Text:PDF
GTID:2218330362956453Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the deepening of information technology, enterprise data amount of information growing exponentially. It brought the following two questions: First, there are lots of duplicate data, resulting in substantial waste of storage space, raising the cost of enterprise storage; Second, high-traffic makes the disk storage device to access the data center has become the performance Bottles. In response to these problems, proposed a storage system model based on the iSCSI platform with the combination of de-duplication and Hierarchical Storage technology.First, in-depth study of the data de-duplication technology, using a hash-based de-duplication ways to achieve the following basic features: fingerprint calculation, fingerprint search, fingerprint index table management. Then Hierarchical Storage of"DRAM-SSD-DISK"technology was proposed. Solid-state disk is the system of secondary cache. Using of its good performance, large capacity, non-volatile memory to improve overall system performance. Through virtual space mapping function, the physical disk is a virtual into a larger virtual disk. By mapping the virtual disk by partitioning to multiple clients, to achieve single server - multiple clients function.Second, optimized the fingerprint search algorithm of de-duplication, which is one of performance bottlenecks. First proposed Bloom filter-based search filtering algorithm and it can filter out a lot of unnecessary fingerprint search request. Then implemented the fingerprint index table "memory - solid state disk " tiered storage strategy which take full advantage of the better read performance of SSD to avoid the disk access performance bottleneck.Finally, do a number of related system testing. First, performance comparison was tested, and the results showed that the duplication function as large amount of calculation, it will bring some performance loss, but if you add a hierarchical storage technology, the overall performance but there is some improvement. Then the compression rate of duplication were tested, and the results show that, if the data de-duplication technology is used in a high degree of duplication of information applications, such as document applications, will be a better compression effect. Finally, the search filter algorithm is tested. The filtration rate and false positive rate has reached the desired effect.
Keywords/Search Tags:De-duplication, Hierarchical Storage, Fingerprint search optimization
PDF Full Text Request
Related items