Font Size: a A A

Research And Implementation Of Privacy Preserving Algorithm For Government Big Data Based On Network Representation Learning

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:G D WangFull Text:PDF
GTID:2506306050964819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The risk of privacy information in data restricts the development and application of the open sharing platform for governmental big data.The algorithms of privacy preserving are one of the efficient ways to solve this problem.They can protect the sensitive information of data while guarantee the effectiveness of data.Since the algorithms of privacy preserving were proposed,there has been a large amount of achievements.However,the existing algorithms cannot satisfy the demands of the open sharing platform for governmental big data directly.For example,the multi-level sharing requirements of the platform cannot be satisfied by one algorithm alone.And the existing algorithms cannot protect private information of each participant in multi-party learning efficiently.And the general anonymity algorithms will lead to a high information loss on the data with many attributes.Recently,the network representation learning algorithm has shown excellent performance in extracting data feature vectors,and the federated learning framework has become one of the most effective methods to break through the “Isolated Data Island”.They has been employed in many practical applications.Therefore,this paper aims to explore a framework of privacy preserving especially for the open sharing platform for governmental big data based on the network representation learning,differentially privacy,and federated learning.The main contents are as follows:For the data which should be shared and remain undistorted in the open sharing platform,this paper designs a multi-level sharing system for the keys according to the structure and the requirements for privacy preserving of the platform.This system assigns two sets of keys for departments and platforms respectively based on the RSA algorithm,and proposes a set of agreements for sharing.Specifically,in order to share date at multiple levels,the data is encrypted in pieces according to the different private levels of different pieces.It is noted that the data with higher level will be encrypted more times.Moreover,in order to improve the performance of this model,the AES algorithm which is one of symmetric encryption algorithms is introduced to encrypt data directly.And utilize the RSA algorithm to encrypt and share the keys of the AES algorithm.For the data that are shared among multiple participants to train a machine learning model together in the open sharing platform,this paper proposes a model which integrates network representation learning,differentially privacy,and federated learning.This model consists of two phases,the first phase aims to learn the low-dimensional representations of source data,the second phase protects sensitive information in multi-participant learning.In the process of learning,the structured data is modeled as an attribute network firstly.Then the meta-structure of the network will be extracted from the network according to the attributes.It is important to generate the high-dimensional representations and the matrix of proximities among nodes following node sequences,which are generated by random walk.Finally,a deep auto-encoder model is employed to transfer the high-dimensional representations to the low-dimensional representations.Moreover,for two different ways of multi-participant learning(i.e.,MLaas and collaborative training method),differentially privacy and federated learning are introduced to protect the sensitive information separately.Especially,a partial noise mechanism is present to add noises in differentially privacy model.And a weight-based gradient update scheme is present to integrate all gradients from local models in federated learning model.For the data that need to be open to the public in the open sharing platform,this paper improves the(α,k)-anonymity algorithm based on clustering to perform privacy preserving.Some useless attributes are reduced before clustering,and the principle of maximum distance from existing cluster centers is adopted when selecting a new cluster center in this model.The purpose of them is to improve the performance of clustering as soon as possible.The process of clustering is followed by the α principle,and once the clustering is completed,the attributes will be generalized by the k principle.This model utilizes clustering to achieve the division of equivalent class,which efficiently reduces the amount of information loss when performing anonymity.The three models which are introduced above design three different policies for privacy preserving according to different requirements of the three types of data in the open sharing platform.And these three models are integrated to be the overall framework for privacy preserving.
Keywords/Search Tags:Governmental Big Data, Privacy Preserving, Network Representation Learning, Differentially Privacy, Federated Learning
PDF Full Text Request
Related items