| With the rapid development of computer information technology in today’s society,the performance of database systems has become increasingly powerful,and data sharing between people has become more convenient and frequent.The huge expansion of data has led to an extra increase in the channels for people to obtain data and information,and it will inevitably lead to more serious data privacy leakage problems.Therefore,good privacy protection performance has become an urgent need.This article chooses all the entry points of differential privacy in the direction of privacy protection,mainly focusing on the two directions of data mining and data publishing.The optimization goal of the differential privacy algorithm is a more reasonable privacy budget allocation strategy and the improvement of data availability while maintaining the strength of privacy protection.First,to solve the problem that the clustering availability of the previous differential privacyoriented clustering algorithm is not ideal and is greatly affected by the data distribution in the data set,a differential privacy clustering algorithm based on spatial dynamic partition(DPQTk-means algorithm)is proposed.Use the quad-tree structure to show the data distribution in detail.Use smaller buckets in areas with dense data distribution,and use larger buckets in areas with sparse data distribution.Dynamically divide the data space and use it as much as possible.Fewer storage buckets fully represent the data distribution,reduce the overall insertion amount of Laplace noise,and optimize the selection of the initial center point.Then use the processed data to run the conventional differential privacy clustering algorithm.Experiments on the real data set prove that the algorithm can effectively improve the availability and accuracy of clustering,and has a greater efficiency than the conventional algorithm.Secondly,in view of the problem that most of the histogram publishing algorithms based on differential privacy in the past are oriented to static data sets,a data stream-based adaptive threshold differential privacy histogram publishing algorithm(Adaptive DPHP algorithm)is proposed.The algorithm samples several event sequences on the time stamp based on the distance threshold.Events with a distance difference below the threshold still use the previously published histogram,and events above the threshold republish a new histogram.Since the selection of the distance threshold depends on prior knowledge and initial testing,this paper uses PID(Proportional-Integral-Derivative)general control loop feedback mechanism to dynamically adjust the threshold over time,which is called the Adaptive-Threshold mechanism,and the differential privacy budget uses Adaptive-ε mechanism.The experiment proves that the algorithm can effectively reduce the release error and improve the data utility.Finally,combining the above two new algorithms,a prototype system for differential privacy data mining and publishing is designed at the application level,which has achieved good application value. |