| With the continuous promotion of the national digital economy strategy,recent years have witnessed the unprecedented success of machine learning in the big data field.However,as a kind of data-driven technique,machine learning is not only contributed to the technological innovation but also put forwards the public concern about data security.With an eye to future development,exploring a secure win-win data sharing strategy has become the only promising outlet for machine learning applications.Following the above trend,this thesis focuses on developing a feasible solution to break data isolation for current machine learning applications.Since the solution combines the privacyenhancing technology with machine learning,it is traditionally called privacy-preserving machine learning.In more detail,my research tends to propose a new scheme to achieve full life-cycle privacy-preserving machine learning,including system architecture design,data preparation,model training and model security protection.Our main innovations and contributions are concluded as follow.First,considering that the training data can be shared from different data resources,data join has been the vital step to achieve privacy-preserving machine learning in applications.To this end,this thesis develops a new oblivious shuffle based method to achieve ID-private data join,called iPrivJoin.Compared with current private data join methods,iPrivJoin can avoid the introduction of redundant data while providing additional characteristic about ID protection.Experiments show that compared to existing methods,iPrivJoin saves at most 300%overhead for the following model training process.Moreover,iPrivJoin also involves optimization for oblivious shuffle.The optimized oblivious shuffle can decrease around 32.83%running time and 51.84%communication overhead compared to existing methods while processing high-dimensional data,e.g.,images and texts.Second,for privacy-preserving machine learning model training and prediction,this thesis first proposes new algorithms to achieve basic operators,like division,logarithm and exponentiation.Then,a series of novel algorithms to implement privacy-preserving machine learning model training and prediction are developed.Experiments validate that the new basic operators can averagely achieve more than 30%efficiency improvement.Also,due to the improved operators,the efficiency of model training is improved.Besides,by utilizing the non-interactive privacy-preserving computation operators,we can accomplish privacypreserving machine learning model prediction with almost no communication overhead.Third,privacy-preserving machine learning system brings not only more guarantees for data privacy but also more space for the adversary to use crafted data to launch attacks.To defend against data poison attack for backdoor injection,this thesis proposes a machine unlearning based backdoor erasing method.By utilizing the method,the defender can force the infected model to unlearn its memorization about backdoor trigger patterns.Experimentally,it is validated that our defense can lower the attack success rates of most state-of-the-art backdoor attacks for about 90%,which gains 10%improvement compared to existing methods.Finally,from the perspective of system architecture,this thesis proposes a novel system architecture for privacy-preserving machine learning that is suited to parallel task execution.Different from the past single task-driven design style,the newly proposed system architecture is driven by basic privacy-preserving computation operators,and contains two essential system modules to optimize program circuit depth.Based on the architecture,the system can better process the fragmented communication of privacy-preserving computation and achieve the concurrency of different tasks.It is also proved that compared with the traditional task-driven design,my design can improve about 200%system throughput. |