Font Size: a A A

Research On Parameter Estimation Of Streaming Data With Differential Privacy

Posted on:2024-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2568307088955179Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Machine learning technology play an important role in the age of big data,with remarkable success in areas such as image recognition,recommendation systems and natural language processing,thanks in part to the data sets used for training model.However,these data sets are likely to contain sensitive personal information,and there is a risk of privacy disclosure if they are directly publishing or used for training.Therefore,how to mine and utilize data more effectively under the premise of protecting users’ privacy is an urgent problem to be solved.Differential privacy is an effective technology to solve the privacy leakage problem.By adding noise to the query results,the attacker cannot judge whether a user is in the data set,even if the attacker has strong background knowledge.This paper focuses on the parameter estimation of streaming data: due to the fact that streaming data is massive,fast,real-time and no longer stored once processed,it is impossible to estimate parameters using all data sets.In this paper,we use the idea of online updating to estimate parameters,that is,each update only uses current batch data and the statistics of previous data,and combined with differential privacy technology to protect privacy of users.The main research contents and achievements of this paper are as follows:Firstly,we compared the variance of noise added to achieve the same privacy effect using the Gaussian mechanism of(ε,δ)-differential privacy and Gaussian differential privacy separately.The result shows that the noise added by the two differential privacy mechanisms is basically the same when the ε is large,but the noise added by the Gaussian mechanism using Gaussian differential privacy is significantly smaller when the is small.Secondly,we proposed a parameter estimation algorithm for streaming data with differential privacy protection.That is,When a new batch of data arrives,only use this batch of data to update the parameters once.A gradient clipping parameter is used to clip the gradient of larger sample points to control the sensitivity;based on the above comparison,the Gaussian mechanism of Gaussian differential privacy is selected to add normal distribution noise on the gradient,so that the whole update step satisfies differential privacy.Thirdly,we given the privacy preserving effect of the whole algorithm.By using the parallel composition theorem of Gaussian differential privacy,the whole algorithm is equivalent to the combination of multiple privacy mechanisms apply on different data sets,and the privacy budget of the combination is determined by the single privacy mechanism with the worst privacy effect.Compared with other algorithms,this algorithm achieves good privacy preserving effect with a smaller privacy budget.At last,we achieve the simulation and the experiments on real data sets,which show that the algorithm can achieve almost the same accuracy as that without adding noise,by choosing the appropriate clipping parameters,this indicate that the algorithm gives consideration to both the privacy protection and data availability.
Keywords/Search Tags:privacy protection, differential privacy, streaming data, gradient descent
PDF Full Text Request
Related items