Font Size: a A A

A Study Of Semi-Supervised Label Inference Attacks For Vertical Federated Learning And The Defense Methods

Posted on:2023-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:L Y DengFull Text:PDF
GTID:2568306911981829Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Federated learning,as a distributed machine learning system that balances efficiency and privacy,can meet the needs of all training participants for collaborative training without sharing privacy data,and help every participant to cooperatively train a machine learning model with excellent performance by using their own computing resources.In this way,multiple participants in federated learning will achieve a learning goal under the coordination of a parameter server.As the frontier of federated learning,vertical federated learning studies focus on scenarios where data samples of different participants have large overlap but low overlap of sample features,such as the collaborative training of banks and e-commerce platforms.It is one of the basic guarantees for vertical federated learning to protect the privacy of tags owned by all parties.However,with the gradual maturity of security research on federated leaning,more and more studies have found that there are privacy leakage loopholes in the architecture of the original federated learning system.Although the attacker can not access the victim’s raw data,but as a result of the parties in the federal learning system will be the gradient information exchange model,the gradient as the intermediate results of training,will be a certain correlation with the local data,the attacker can use to contact gradient information to establish the gradient and the intermediate output secretly,victim data attributes relevant knowledge.The network model structure and gradient updating mechanism in the vertical federated learning system have the possibility of privacy leakage.Malicious attackers can disguise themselves as normal participants,use the global model gradient information downloaded from the server to infer the private tags owned by others in the collaborative training.1.In view of the existing privacy vulnerabilities,this paper proposes a semi-supervised tag inference attack model for vertical federated learning,with attackers hidden between participants.In this paper,considering that there may be most of the data without labels on the clients,the semi-supervised learning algorithm FedCon is applied to improve the model classification performance.Based on the characteristics of vertical federated learning spliting model,the first step of the attack in this paper is to add a prediction layer at the end of the local model to find the potential correlation between model prediction output and input data during training.The second step of the attack is to find the relationship between global gradient information and data labels.Then we formalize the attack model and theoretically proves the feasibility of inference attack that deduce server data labels from global gradient.In this paper,the attack effect of the attack model is verified experimentally in MNIST,CIFAR-10 and Yahoo Answers data sets.The experimental results show that malicious participants can infer more than half of the server’s data tags in the training phase without reducing the global classification model too much to be detected by the server.Compared with purely semi-supervised learning for tag inference,our attack model achieves higher accuracy.In addition,we also evaluate the impact of different number of labeled samples on the attack effect,and the experimental results prove that the attack model proposed in this paper only needs a small number of auxiliary samples to achieve high attack accuracy.2.In view of the proposed semi-supervised inference attack,we discuss the disadvantages of the current mainstream defense scheme,and proposes a set of joint defense scheme combining gradient compression and differential privacy.The scheme firstly reduces the scale of gradients that a malicious attacker can contact,so that even if an attacker deplores an attack model locally to simulate the classification results of the global model,it is not possible to quickly obtain high-precision inference performance through a small number of gradients in limited training epochs.In addition,the differential privacy which add noise to output is also carried out at the gradient communication,and the noise makes part of the positive and negative gradient information hidden,which leads to the wrong label information calculated by the attacker when he wants to infer the positive and negative gradient partial derivative.The experiment evaluates the defense effect of three kinds of data sets using the pure differential privacy scheme and the joint defense scheme proposed in this paper.Experimental results show that under the same privacy budget,the defense scheme proposed in this paper has less impact on the classification performance of the global model,and has stronger defense effect on the attack model.
Keywords/Search Tags:Vertical Federated Learning, Inference Attack, Semi-Supervised Learning, Differential Privacy
PDF Full Text Request
Related items