Font Size: a A A

Research On Multi-party Collaborative Learning Based On Data Labeling For Classification

Posted on:2024-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2568306932962119Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning has made great breakthroughs in classification,but establishing a favorable framework of machine learning often requires the support of large amounts of labeled data.With the release of relevant laws and regulations and the awakening of people’s privacy awareness,sharing data directly becomes more and more difficult.Multi-party collaborative learning is one of the key technologies to solve this problem.Among them,multi-party collaborative learning based on data labeling can transfer local knowledge without sharing original data,which reduces the risk of privacy disclosure of users.However,in data collaborative labeling,the participant as the main body of data labeling is irregular,which will inevitably introduce noises and affect the quality of labels.Currently,in order to make full and effective use of labeled samples,noise is usually filtered by modeling based on prior knowledge which is difficult to obtain in practice.At the same time,in the scenario of collaborative labeling for classification,the annotation vector that imply the privacy information of the participants is vulnerable to threats including differential attacks,and privacy security is difficult to be guaranteed.In response to the above challenges,this thesis proposes a learning algorithm based on labeled samples and a privacy-preserving collaboraive learning algorithm for classification,aiming to ensure the privacy of participants and maximize the accuracy of the model in classification tasks.Specifically,the main work of this thesis are as follows:1.In order to enhance the effectiveness of learning with labeled samples,this thesis designs a new objective function CL-MRT oriented to classification,and proposes a learning algorithm based on labeled samples for classification(LLSC)based on this objective function.The LLSC algorithm innovatively introduces calculation factors based information-entropy to measure the confidence of the aggregated labels,focusing the model’s attention on samples with more stable label quality,and reducing the adverse impact of noise on model training;At the same time,a mixed regularization term is designed to minimize the probability of wrong categories and the risk of model structure,so as to improve the stability and generalization of the model.This thesis conducts experiments on multiple public datasets,and the results show that the LLSC algorithm can effectively learn from labeled data and improve model accuracy in the absence of prior knowledge.In addition,in the poisoning attack scenario,this method can reduce the influence of poisonous samples on model training and enhance the robustness of model.2.In order to better protect the privacy of participants in collaborative learning,this thesis proposes a privacy-preserving collaborative learning algorithm for classification(PCTC)based on a hybrid protection strategy of homomorphic encryption and differential privacy.The PCTC algorithm innovatively designs a secure aggregation mechanism to avoid calculation overhead and precision error of floating-point numbers.At the same time,a packaging strategy is introduced to make full use of the plaintext space according to the characteristics of labeled vector in classification,thereby further improving the efficiency of encryption and decryption operations.Finally,LLSC is used to learn from labeled samples to reduce the impact of noise on model performance.This thesis conducts a detailed security proof and correctness analysis,sets up a series of experiments,and compares proposed algorithm with related schemes to verify the advancement of the algorithm.The results show that PCTC can achieve the lowest encryption and decryption cost in most scenarios while ensuring the privacy of participants and the relative performance of the model.
Keywords/Search Tags:Multi-party Collaborative Learning, Classification, Date Labeling, Privacy-preserving
PDF Full Text Request
Related items