| The rapid growth of big data has led to the deployment of machine learning models in various domains and has provided a variety of efficient services to humans.However,this also poses a huge challenge to big data privacy,and the emergence of membership inference attacks reveals the privacy issues that exist with big data models.Simply by accessing a big data model,it can be inferred whether the accessed data originated from the training data of the model,which in turn leads to data privacy leakage.Currently,a great deal of academic work has been conducted on membership inference attacks,and various membership inference attacks and defenses schemes have been proposed and implemented.However,there are still some issues with the existing related work.For the membership inference attacks schemes,on the one hand,the attack scenarios are not realistic enough,such as the need to obtain confidence scores,leading to certain limitations in the implementation of the attack schemes;on the other hand,the success rate of the attacks is not high,the cost of the attacks is large,and their performance needs to be further improved.For defence schemes against membership inference attacks,it is difficult to trade-off the utility of the model against privacy,and model performance often needs to be sacrificed in order to achieve an effective defence.This paper presents a practical and high-efficiency membership inference attack scheme through an in depth study of membership inference attacks and defenses in machine learning,and designs an effective framework for defending against membership inference attacks on this basis.The research and contributions of this paper can be summarized in two ways:(1)A frequency domain based membership inference attack scheme(FD-MIA)is proposed to address the problems of insufficient realistic attack scenarios and high attack costs in existing membership inference attack schemes in machine learning.The scheme starts from the data itself,firstly by separating high and low frequency information through the discrete cosine transform to obtain more data samples,then predicts these data samples and finally performs membership inference attack based on the threshold value.Compared to existing attack schemes,this scheme only requires the availability of prediction labels,thus better matching real-world attack scenarios.In addition,this scheme requires only a few queries,reducing time costs and computational overheads and making it more feasible.Through extensive experimental evaluation,it has been demonstrated that the proposed attack scheme is capable of achieving effective attack in more realistic scenarios.(2)A membership inference attack defence framework(KD-GAN)based on generative adversarial networks and knowledge distillation is designed to address the problem that existing membership inference attack defence schemes do not provide a good trade-off between model utility and privacy.The defence framework first generates a batch of labeled dummy data via a generative adversarial network,then uses the private data to train a teacher model normally to provide guidance to the student model,and finally uses the generated dummy data as transfer data in knowledge distillation to train a protected student model.Through extensive experimental evaluation,it is demonstrated that the proposed defence framework not only effectively protects the data privacy of the private training set,but also ensures that the performance of the model is not compromised,achieving a good trade-off between model utility and member privacy. |