| The protection of personal privacy has attracted widespread attention in the era of big data,but the privacy protection release of high-dimensional datasets is still a big challenge.In order to solve the problem of privacy protection data release caused by the dimensional disaster,there are mainly methods based on dimensionality reduction and methods based on anonymity.However,the datasets released by these two types of algorithms have large information loss.This dissertation proposes two high-dimensional synthetic datasets publishing algorithms under differential privacy protection,and improves the differential privacy high-dimensional dataset publishing algorithm based on Bayesian network from different perspectives.(1)DPSM-Bayes algorithm based on differential privacy sampling mechanism and Bayesian network was proposed.The algorithm uses the privacy amplification characteristics of differential privacy sampling mechanism and dataset reduction to adjust the sensitivity of mutual information to improve the accuracy of the constructed Bayesian network structure,and proposes an IMLaplace mechanism that is more suitable for adding noise to high-dimensional probability distributions.The system deviation caused by adding positive Laplace noise to the low probability distribution is effectively reduced,and the fitting degree of the edge distribution after adding noise to the original dataset is further improved.(2)DPGB-Bayes algorithm based on Bayesian network and Gibbs sampling proposed in this dissertation is a further improvement of DPSM-Bayes algorithm.DPGB-Bayes algorithm introduces In Dif function with simple calculation and low sensitivity as the scoring function of exponential mechanism.It fundamentally solves the problems caused by mutual information function in the construction stage of Bayesian network structure satisfying differential privacy.The weight average technique is introduced to perform consistent post-processing on the edge distribution after adding noise by the IMLaplace mechanism,so that the edge distributions after adding noise are more consistent with the real distribution.Gibbs sampling algorithm is proposed to sample the marginal distribution,so that the sampling accuracy is no longer limited by the size of the synthetic dataset.A large number of experiments have proved that each improved sub-algorithm can obviously achieve better performance,so it can be used as a basic module in other scenarios;on the premise of providing the same differential privacy protection,the two algorithms proposed in this paper can effectively deal with the problem of publishing high-dimension datasets,which has higher usability than existing methods.Figure [15] Table [6] Reference [72]... |