| Data deduplication technology can effectively eliminate duplicate data,and it has been widely used in backup storage,archive storage,primary storage and other systems.Deduplication system has data sharing characteristics.Although data sharing allows the system to eliminate redundancy,but it causes a "fragmentation problem" during data restore,which results in a low data restore performance of deduplication system.Although current researches can improve the data restore performance to a certain extent,but according to the findings of this paper,there are still duplicate container read operations associated with fragmentation problems in the data restore process.In addition,compared to regular storage system,deduplication system adds extra calculation and I/O operations,resulting in insufficient write performance.Especially when a large amount of stored data is waiting to be written,the underperformance problem will become unacceptable.Although current researches have proposed optimization schemes in terms of chunking and indexing,but most of these researches are discussed under incremental data,and few researches have optimized performance for stored data.Aiming at the duplicate container reading problem in the data restore process of deduplication systems,this paper proposes the multi-level farther forward assembly(MLFASM)data restore method by analyzing the characteristics of data restore process and the causes of duplicate container reading.MLFASM divides the data assembly operation into two levels,and the basic unit of each level of data assembly operation is the "trigger chunk mapping set",and the size of the trigger chunk mapping set is used as the cache priority of the secondary assembly area.With minimal increase in memory usage and calculation,MLFASM can significantly reduce the number of duplicate container read operations by assembling data chunks farther and utilizing memory more efficiently,thus increasing the speed factor and restore throughput of the system.Evaluation results of experiments with real data sets show that,compared to traditional data restore methods,MLFASM improves speed factor by 34.9% to 58.0% and restore throughput by 21.1% to 31.8%.In addition,to address the problem of insufficient write performance of deduplication,this paper makes full use of the multi-data node feature of distributed storage architecture,proposes a fast deduplication scheme(Fast Dedup)for the distributed storage environment of stored data.Fast Dedup improves deduplication speed through deduplication task distribution model and multi-container pool technology.Specifically,the deduplication task distribution model maintains the correctness for multiple deduplication nodes working simultaneously.The multi-container pool technology saves the operation time on the data merging stage.Evaluation results on three real backup datasets demonstrate that,compared to the unimproved technique,Fast Dedup increases deduplication throughput by 3.2% to 69.1%,while sacrificing a small amount of the deduplication rate. |