Font Size: a A A

Research On The Data Deduplication Strategy Of File Blocks For Cloud Storage

Posted on:2023-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ZhuFull Text:PDF
GTID:2558306902951119Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In this fast-developing information age,data is growing explosively,and how to maintain efficient data storage has become an important topic in the development of modern information technology.At the same time,in the process of data storage,there is often a large amount of duplicate data,which not only consumes a huge amount of storage space,but also leads to a decrease in the storage efficiency and network bandwidth utilization of the cloud storage system.In view of this problem,the technology of deduplication is introduced to study and solve it.It can effectively optimize the storage system,keep the files stored in the entire cloud server at an acceptable number,and improve the efficiency of data transmission in the network.This paper focuses on the deduplication technology under cloud storage as the main research work.The contributions and innovations are as follows:Firstly,in view of the low deduplication speed and deduplication rate in the content-based data chunking algorithm,a content-based fast balanced chunking algorithm is proposed.The algorithm selects the faster Gear hash as the rolling hash algorithm,and reassigns the value table to obtain the Gear-V algorithm,which effectively improves the running speed of the algorithm.At the same time,in view of the problem of too small sliding window caused by the selection of Gear hash,the window value is expanded during hash judgment to achieve the purpose of improving the running speed,and special symbols are added as new judgment conditions to obtain higher deduplication rate.Finally,the minimum block threshold is re-selected to obtain a higher deduplication rate.Experiments show that the content-based fast balanced chunking algorithm has a higher deduplication rate than the traditional chunking algorithm while ensuring operational efficiency.Then,in view of the problem that data deduplication technology cannot fully utilize the storage server and computing performance in cloud storage systems,a distributed deduplication scheme based on file block similarity is proposed.The scheme consists of two key parts,namely,the file block strategy and the file block routing strategy.The former divides the file and reorganizes it into file blocks,and the latter distributes the organized file blocks to the target server selected by the routing strategy to realize multi-server parallel deduplication.Through the comparison of multi-node simulation experiments,the distributed deduplication scheme based on file block similarity solves the problem of low resource utilization and has better relative deduplication rate,throughput,and deduplication performance.Finally,on the basis of the previous two research work,file sharing security is added,and a cloud storage file security deduplication system is designed and implemented.The system includes five functional modules,namely user registration module,user login module,file upload and approximate repetition rate detection module,file retrieval module and file sharing module.The system has passed the functional test,basically realized all the functions that should be completed,and achieved the expectations.
Keywords/Search Tags:Cloud storage, Rolling hash algorithm, Data block algorithm, Data deduplication technology, Secure data sharing
PDF Full Text Request
Related items