| With the rapid development of information technology,computer technology,and Internet technology and the popularization of various mobile terminal devices,various Internet applications and big-data-based services emerge,which have brought tremendous changes to people’s daily life.However,data is the core of Internet service applications and contains a large amount of important private information of users,enterprises,and even government agencies.In the application of the Internet and big data,frequent data exchange is often required between users and service providers,which also leads to increasingly severe data security and privacy issues.In order to protect the security and privacy of user private data,the European Union and the United States have proposed the General Data Protection Regulation and the California Consumer Privacy Act respectively.Besides,China has also promulgated China’s Cybersecurity Law,China’s Data Security Law and Personal Information Protection Law of the People’s Republic of China.The relevant laws and regulations listed the data security and user privacy as the top priority,and regulated the users’ right to privacy in Internet applications.At the same time,the academic community has proposed technologies such as homomorphic encryption,k-anonymity,and differential privacy to protect the privacy of users while ensuring data utility.Combining the existing privacy protection techniques,privacy computing theory and key technology systems has been proposed from the perspective of whole life-cycle preservation of private information,in order to meet the demand for systematic privacy preservation in complex application scenarios.It abstracts the process of privacy protection into 5 steps: 1)extracting private information,2)abstracting the scenario,3)selecting the privacy operation,4)selecting or designing the privacy-preserving scheme and 5)evaluating the privacy-preserving effectiveness.Among them,differential privacy is the de facto privacy preserving technique since it provides a strict mathematical definition for privacy leakage and does not depend on the background knowledge of adversaries.So far,differential privacy has become one of the most effective techniques in the privacy computing framework.However,although differential privacy has greatly promoted the development of privacy protection technology,it still faces many challenges when providing the whole life-cycle preservation of private information under the privacy computing framework.Existing differential privacy schemes often cannot provide satisfactory protection guarantees or data analysis results with high utility.On the other hand,there is still a lack of sufficient analysis on the privacy and security of differential privacy mechanisms.This dissertation focuses on the two aspects of problems under the privacy computing framework: 1)selecting or designing the differential privacy scheme and 2)evaluating the effectiveness of the differential privacy mechanism.The main research contents include the following parts:1.For the challenge in selecting or designing the differential privacy mechanism,this dissertation studies protecting regression models with personalized local differential privacy: For the protection of the machine learning model,based on the personalized local differential privacy,this dissertation studies the defense mechanism for linear regression and logistic regression model coefficients against the equationsolving model extraction attack.Specifically,this dissertation first sets the safe region for each model coefficient according to the importance of the corresponding data feature and the model owner’s privacy preference.Based on the safe region,this dissertation studies the sufficient condition that the noise can satisfies the personalized local differential privacy.Putting all things together,this dissertation proposes the mechanism using dynamic noise to provide differentially private protection for the model.The proposed mechanism can adapt the noise according to the attack and significantly reduce the attack efficacy comparing with other mitigations under the same attack setting.Meanwhile,this dissertation further optimizes the mechanism and provides the effective protection for model coefficients even under the misconfigured-large privacy budget.The proposed mechanism can protect coefficients of the regression model and its availability and high-efficiency are verified by comprehensive experiments.2.For the challenge in selecting or designing the differential privacy mechanism,this dissertation studies key-value data collection with high-utility distribution estimation under local differential privacy: In terms of data collection mechanism design under local differential privacy,this dissertation focuses on a typical heterogeneous data — key-value data,and proposes data collection mechanism with highutility distribution estimation under local differential privacy.Specifically,the mechanism first samples one key-value pair of each user,and then this dissertation proposes the perturbation based on the correlation between key and value and the ordinal nature of the value domain.The perturbation not only protects the data privacy,but also maintains the useful information of the data in the perturbed results.Moreover,the dissertation combines the consistency requirement and proposes the aggregation algorithm to estimate the frequencies of the key and the distribution of the values with high utility.3.For the challenges in evaluating the effectiveness of the differential privacy mechanism,this dissertation studies the security upper bound of the number of linear queries under the differential privacy Laplace mechanism: In terms of the security evaluation of linear queries under the Laplace mechanism,this dissertation study the upper bound of the number of linear queries via information theory under continues datasets(e.g.,salary datasets)and discrete datasets(e.g.,age datasets).Specifically,this dissertation first approximates the query results in differential privacy as normalLaplace distribution,and then measure the privacy leakage by analyzing the mutual information between the perturbed results and the true results.Based on this,this dissertation finds the most aggressive linear query that can leak the most privacy by solving an optimization problem.By studying the number of queries,this dissertation determines the upper bound of the number of queries such that any individual’s information cannot be recovered even under the most aggressive linear query.4.For the challenges in evaluating the effectiveness of the differential privacy mechanism,this dissertation studies poisoning attack and defense for mean/variance estimation under local differential privacy: The dissertation focuses the security evaluation for mean/variance estimation under local differential privacy,and proposes the poisoning attack to manipulate the mean/variance to the attacker-desired target values and the corresponding countermeasure.Specifically,the attacker initially infers the necessary information about the genuine users and then crafts the fake values.Secondly,the attacker injects some fake users who send the fake values to the server.As a consequence,the estimated mean and variance are tampered into target values.Meanwhile,this dissertation studies the attack performance in theory and evaluates the security and privacy guarantee of local differential privacy mechanisms.Based on this,this dissertation also proposes a clustering-based mitigation.The server first subsamples all users and generates multiple groups,and then clusters the groups and detect the abnormal results.Finally,the server discards the abnormal statistics and achieves the defense against the poisoning attack. |