| Data deduplication technology is widely used in cloud storage because it can mitigate the storage pressure of cloud server.Among many data deduplication schemes,the client-based cross-user deduplication scheme outperforms the others due to its low space consumption and low bandwidth consumption.In the client-based cross-user deduplication scheme,the client will send the file hash to the cloud server to determine whether the file has been outsourced by other users before uploading the file,and then the cloud server will return a Yes or No response to the client indicating the existence of the file.However,the response information can be used as a side channel and be exploited by adversaries to compromise data privacy.Especially,when the attacker knows most of the content of a file,he can obtain the remaining file information through brute force learn-the-remaining-information(LRI)attack.In practical applications,the popularity distribution of data in the cloud is highly skewed,popular data is the main origin of redundant data,while sensitive data is concentrated on unpopular data,but existing deduplication schemes and LRI attack mitigation schemes are not well suited for this data distribution.For such data distribution,how to effectively deduplicate and protect data security at low cost has become a new problem.To solve the above problem,the following work has been done in this paper:(1)To achieve efficient deduplication of unevenly distributed data,this paper proposed a novel bloom filter variant named popularity dynamic bloom filter(PDBF),which incorporates data popularity into bloom filter.Moreover,a PDBF-based deduplication scheme was constructed to perform different degrees of deduplication depending on how popular a datum is.High-accuracy deduplication is performed on popular data with high redundancy,and low-accuracy deduplication is performed on unpopular data with low redundancy.The experiments demonstrate that the scheme makes an excellent tradeoff among the computational time,the memory consumption,and the deduplication efficiency.(2)To mitigate the learning the remaining information attack with lower cost,this paper proposed a variable randomized redundant chunk scheme(VRCS).The main idea behind VRCS is to provide more fine-grained protection based on data popularity.It focuses on protecting the sensitive chunks of the unpopular files by calculating the file popularity according to the chunk popularity and variably adding random redundant chunks to mix up the real deduplication status of files.In addition,the VRCS prototype was evaluated on a real-world dataset,the experiments demonstrate that the VRCS performs better in bandwidth efficiency compared to existing works with no change in security. |