Font Size: a A A

Research On Publishing Multi-Dimensional Data Under Local Differential Privacy

Posted on:2024-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LiuFull Text:PDF
GTID:2568306920480274Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the popularization of smart devices and the advent of the era of big data,massive amounts of data information are generated every day,covering various fields.By collecting and analyzing data,data collectors can better understand the behavior,preferences,and needs of data owners.And enabling data collectors to make better decisions and formulate more effective strategies.Thereby,improving service quality and service satisfaction spend.However,the era of big data also brings a series of problems and challenges.Users’ data often contains a large amount of personally sensitive information.Whether in the stage of data collection or release,direct data collection and release will lead to the leakage of personal privacy information.As data security and privacy protection gradually get people’s attention,privacy protection technology is also continuously strengthened.Local differential privacy,as one of the most advanced privacy protection technologies,has received increasing attention.However,the existing work mainly focuses on the one-dimensional data collection and publication tasks in the context of local differential privacy,and the research on the multi-dimensional data publication problem under local differential privacy has just started.Therefore,this paper conducts an in-depth study on the issue of the publication of multi-dimensional data with local differential privacy for two typical types of multi-dimensional data:multi-dimensional data and set-valued data streams.To summarize,this thesis makes the following contributions:(1)This thesis proposed a new multi-dimensional data publication method that satisfies local differential privacy,named PrivIncr.Firstly,in this method,based on the sparseness of the constructed probabilistic graphical model and the divisibility of local differential privacy,an incremental learning-based probabilistic graphical model construction method is proposed.The main idea is to gradually prune edges(i.e.attribute pairs)with weak correlations and allocate more data and privacy budget to the useful edges,thus improving the accuracy of the constructed probabilistic graphical model.In particular,a high-precision data accumulation technology and a low-error edge pruning technology are introduced to improve the accuracy and efficiency of the model construction in this method.Secondly,based on the joint distribution decomposition and redundancy elimination,a novel joint distribution calculation method for the large cliques is proposed in the context of local differential privacy,which effectively solves the large clique joint distribution calculation problem in the junction tree.Extensive experiments demonstrate that PrivIncr can achieve ideal data utility and reduce communication overhead effectively.(2)This thesis presents a solution to the issue of publishing set-valued data streams under local differential privacy for the first time.Firstly,based on an adaptive budget division strategy,an efficient baseline method called PrivSVS is proposed.PrivSVS quantifies the data fluctuation at each timestamp by defining dissimilarity error and publication error.It then adaptively selects the appropriate strategy,such as approximation or publication strategy,for releasing statistical information.Furthermore,an optimization method called OptimizedPrivSVS is proposed to further reduce utility loss and communication costs.This method is based on the independence of each item’s distribution estimation in set-valued data and the fluctuation characteristics of set-valued data streams.The main idea is to publish the distribution of those items that are observed to have less fluctuation directly using the approximate strategy.This way,more data and privacy budgets can be allocated to estimate the distribution of other items,thereby improving their estimated accuracy.Extensive experiments confirm that both PrivSVS and Optimized-PrivSVS can achieve effective data accuracy,with the Optimized-PrivSVS method showing a significant performance improvement.
Keywords/Search Tags:local differential privacy, data publication, multi-dimensional data, set-valued data streams
PDF Full Text Request
Related items