| Machine learning as a dynamic and potential frontier technology has been widely used in industrial production and social activities,with great economic and social benefits.The quality of machine learning models is closely related to the amount of training data,but the training data usually contains sensitive information about organizational interests or personal privacy,such as electronic medical records of hospitals and human data collected by smart wearable devices,so it is crucial to ensure the privacy of data in the machine learning process.Countries restrict the misuse and leakage of data by enacting relevant laws and regulations.For example,China enacted the "Personal Information Protection Law" in November 2021,and the EU implemented the "General Data Protection Regulation".These laws protect data privacy while negatively impacting the quality of machine learning models to a certain extent.The challenge before us is how to effectively mine and utilize data while protecting personal privacy.Making full use of the value of data without compromising data privacy is an important method to promote the sustainable development of machine learning technology.Based on blockchain and distributed machine learning technology,this topic develops a decentralized privacy-preserving collaborative learning platform.The main work of this topic is as follows:(1)Combining edge computing,blockchain,distributed machine learning technology and secure multi-party computing primitives,this thesis proposes a decentralized privacypreserving collaborative machine learning scheme.The scheme uses smart contracts to replace the central server of traditional distributed learning,avoiding the problem of server sabotage or single point of failure.The problem of insufficient arithmetic power of end devices is solved by expanding data sources with the help of edge computing architecture.In addition,the scheme uses secret sharing,commitment,and homomorphic encryption in secure multi-party computing to ensure the privacy of the original data and the security of the model in the machine learning process.(2)In this thesis,we combine the distributed selective gradient descent algorithm to optimize(1),which speeds up the training speed while reducing the amount of communication while ensuring the availability of the training model,and use blinding and encryption to protect the privacy of the model and the original data.A balance is struck between training efficiency and model effectiveness.(3)In this thesis,the feasibility of the above scheme is experimentally verified.The model accuracy and other metrics are comparable to centralized machine learning.In the scenario of three clients collaborative training,the time overhead and communication overhead increase with the number of nodes in the centralized network,and increase only 91%and 33%,respectively,compared with traditional federated learning.A blockchain-based privacy-preserving collaborative learning platform was subsequently designed and developed,and after practical deployment,the platform can allow multiple parties to securely train models jointly while providing privacy protection. |