| With the advent of the era of big data and cloud computing,personal data is inadvertently and passively collected and used by enterprises and individuals,resulting in frequent incidents of user privacy leakage such as fraudulent telephone calls,causing significant losses to people or countries.Personal privacy data protection has encountered severe challenges,and privacy protection in data publishing has become a research hotspot in the field of network and information security.Compared with the traditional sharing based on authorized users,many data publishing scenarios require privacy processing before data disclosure,so that the data should be open to the public to be used as much as possible without revealing personal privacy in the data.However,privacy protection technology has to face the problem of privacy disclosure and utility trade-off.This is because the stronger the intensity of privacy protection,the more information will be lost and the lower the availability of data;and vice versa.Therefore,how to carry out reasonable privacy protection and the trade-off between it and utility has become an urgent problem to be solved.Based on information theory and neural network,this thesis designs a privacy protection model suitable for data publishing,and studies the measurement and tradeoff between privacy leakage and data utility.The research contents include a perturbation privacy protection model based on partition and Gaussian noise,and a privacy-utility measurement trade-off scheme based on mutual information neural estimator.The specific work is as follows:1.Aiming at the problem of privacy leakage in data publishing,a privacy protection model based on partition and Gaussian noise is designed.The model achieves initial privacy protection by partitioning the original dataset and enhances privacy protection by applying Gaussian noise to the data.Since the partitioning perturbation results in a reduction in the variance of sensitive attributes,higher variance noise can be tolerated without significant performance degradation.Risk assessment and simulation analysis show that the model is able to achieve better privacy protection than a single perturbation while preserving the privacy of the data.2.Aiming at the measurement and trade-off of privacy leakage and data utility,a privacy-utility measurement method based on mutual information and a privacy-utility trade-off scheme based on mutual information neural estimator are designed.Firstly,the mutual information neural network estimator is introduced according to the calculation formula of mutual information.Secondly,by training the mutual information neural network estimator,the mutual information between the sensitive information data set and the published data set,and the mutual information between the data set related to the sensitive information and the published data set are obtained.Finally,the privacy-utility trade-off scheme based on mutual information and privacy funnel are applied to data publishing,and the real data set is selected to evaluate the privacy protection and data utility of the proposed scheme. |