Font Size: a A A

Research On Privacy Protection For Machine Learning And Application

Posted on:2024-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Q DengFull Text:PDF
GTID:1528307157979719Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
As the Internet,big data,cloud computing and artificial intelligence at an astounding rate,machine learning(ML)has been widely applied in various fields,such as finance,insurance,education and medical services.As we know,machine learning is a learning technique based on data.Therefore,ML-based solutions rely intrinsically on ingenious algorithms,but even more so on large training datasets.In practical scenarios,as data are inherently distributed and stored in multiple terminal devices belonging to different users and organizations,hence,it is not possible for individuals or small organizations to build a high-precision model.To mitigate this problem,the traditional approach is to aggregate the data to a central server for centralization training.However,centralized solutions will cause serious privacy leakage due to the sensitive nature of the information contents which involve personal privacy and organizational confidentiality.Consequently,fast and reliable modeling with sensitive distributed datasets becomes issue requiring urgent solution in privacy-preserving machine learning(PPML)field.Although the researchers have designed a multitude of model training and prediction schemes for secure machine learning using cryptographic primitives,it is still a huge challenge to tackle PPML tasks due to the complexity and diversity of machine learning models,the limitation of cryptography,and the restriction of hardware: 1)The schemes based on federated learning are susceptible to inference attacks,leading to increase risk of privacy disclosure.2)The training schemes required to interact between users and servers comes with a substantial overhead of computation and data traffic.3)Existing approaches for privacy computations of unlinear activation functions require to use approximate polynomial substitutions,leading to loss of accuracy achieved by complex models(e.g.,neural networks)in the encrypted domain.To address these limitations,we aim at constructing high-precision and efficient PPML models.More specific,we construct three types of learning models,including support vector machine(SVM)based on federated learning,non-interactive privacy-preserving linear classification learning models and highly accurate models for neural networks.On the basis of theoretical research results,we design an online primary diagnosis system for medical privacy data and a classifier for image privacy data.The main contributions for this thesis are as follows:1.To address the challenges against data and model leakage in federated learning,we build PPSVMT,a model for support vector machine with the federated learning framework using Paillier cryptography and Shamir secret sharing.We provide security analysis which demonstrates that the model can guarantee three types of privacy,including user data,local gradient update and global model parameters,defending against inference attacks.Based on PPSVMT,we design an online primary diagnosis system POMP for medical privacy data.The medical classifier can be constructed safely and provides prediagnosis services for users.For medical datasets,we empirically demonstrate the POMP model can greatly reduce computation and communication cost for users and server.Furthermore,the classification accuracy is comparable to centralized training.2.This thesis considers several limitations of interactive PPML schemes,including multiple peer-to-peer communications between users and servers,leading to lengthy training times and lack of flexibility.To mitigate these problems,we construct non-interactive privacy-preserving linear classification models.First,we define a novel well-separable structure to decouple the model parameters and user data.By constructing gradient update formula satisfying well-separable structure for support vector machine,the privacy-preserving solutions support that data owners are capable of going offline after preprocessing local data and uploading to the server.To reduce the data traffic,we design a data compression and parsing algorithm based on modular arithmetic and Horner’s rule,which achieves an efficient scheme for non-interactive privacy-preserving SVM training(abbreviated as NPSVMT).We generalize the SVM model based on well-separable structure to other linear models,including logistic regression,linear regression,and ridge regression.The experimental results show that non-linear training schemes based on well-separable structure outperform those interactive methods in terms of computation and communication overhead.3.This thesis focuses on a significant challenge of loss of accuracy in neural network learning.The main reason is the use of polynomial approximations for substituting activate functions in order to accommodate the primitives,such as homomorphic encryption and secret sharing.We develop a training scheme for neural network(abbreviated as NPANN)using functional encryption for inner-product and mask matrix.Furthermore,the model satisfies several good features,including the privacy of user data and model parameters,against inference attacks and non-interactivity.Based on NPANN,we design an image classifier PPIC without sharing user images.The PPIC classifier can complete classification tasks while achieving high-precision.For MNIST datasets,the performance evaluation shows that the NPANN and PPIC enable the creation of highly accurate models efficiently,which is feasible in practical scenarios.
Keywords/Search Tags:Privacy-preserving, machine learning, non-interactive, well-separable structure, functional encryption
PDF Full Text Request
Related items