| The unceasing development of network technology produces abundant data resources containing much private information,such as medical records,personal locations,and biological features,so the machine learning(ML)applications based on such data are exposed to privacy threats.Related studies have demonstrated that ML suffers from privacy leakage during the whole process of data collection,training,and prediction.Privacy threats break the confidentiality of ML training data and prevent the large-scale applications and fast development of ML.Hence,ML privacy protection attracts significant attention from both academia and industry.Although the research on ML privacy protection has achieved significant progress,there are still some drawbacks.First,on the one hand,the lacking of systematic studies about ML privacy threats cannot provide sufficient support for the techniques of ML privacy protection; on the other hand,existing studies differ a lot in conjecture conditions and lack the exploration of minimal boundary conditions.Second,existing privacy protection methods are difficult to cover.The whole ML lifecycle can only protect one kind of private threat or cannot balance the protection capability and the model performance after introducing privacy protection.Third,there is a lack of data-feature-oriented assessment of privacy threats,leading to the incompletion of assessment mechanisms,so existing techniques cannot provide user-level control of privacy threats for data owners.These three drawbacks hinder the development of ML privacy protection.As a result,existing privacy protection techniques cannot meet the requirements of ML large-scale application and fast growth.This dissertation investigates the risk of ML privacy leakage,privacy protection methods,and privacy threat assessment to solve these three drawbacks.Detailed research contents are presented below.1.The study of practical-scenario oriented ML privacy threatsExisting research assumes that attackers can obtain considerable information but lack exploring the minimal conjecture boundary of privacy threats.This dissertation proposes an ML threat method based on GAN(Generative Adversarial Networks),GANMIA.Then,using the CIFAR10 dataset,this work carries out attack experiments with no more than 2%of known target model training information.Experiments show that the attacking accuracy of GANMIA can achieve 82.1%,which is 23% higher than the black-box attacks,which only use the original data.2.The research on ML privacy protection(1)Existing studies about ML privacy threats either cannot cover the whole ML lifecycle or can protect one kind of privacy threat.This work proposes a privacy protection mechanism that covers the whole ML lifecycle and can address multiple privacy threats.The mechanism adopts multiple adversarial perturbation algorithms and feedback mechanisms.Then,this work investigates the privacy protection of the NN model based on adversarial perturbation generators,which use multiple algorithms,including Adv GAN and FGSM.Experiments demonstrate that the proposed mechanism can effectively prevent the direct privacy threats in data upload and the membership inference attacks in the process of training and application.(2)The balance between privacy protection capability and the performance of the model application is hard to achieve.This work proposes a privacy-preserving method based on homomorphic encryption and an updated parameter selection scheme to achieve the balance.First,the dissertation carries out a theoretical analysis of the s and semantic security of homomorphic encryption and the accuracy of the target model.It then proves that the proposed mechanism can effectively prevent privacy leakage.After that,this work exploits the Paillier algorithm and the CKKS algorithm to conduct extensive experiments with the MNIST dataset,and the results show that the accuracy of the NN target model after introducing the prevention mechanism achieves about 96%.3.The research of data-owner oriented privacy risk assessmentExisting studies mainly focus on quantizing the model privacy risk.Still,they lack the analysis and assessment methods for the data-feature-oriented privacy risk,resulting in the incompleteness of the privacy risk assessment.As a result,the current research cannot provide user-level control of the privacy risk.With the Res Net-based and NNbased target model and the MNIST,CIFAR10,and CIFAR100 datasets,experiments show that the proposed method can accurately reflect the degree of data privacy risk and support the control of data privacy risk. |