In the current era,artificial intelligence technology promotes the sustainable development of wise information technology of medicine(WITMED).While big data brings value and convenience to medical industry,it also contains risks and challenges of privacy disclosure.In recent years,graph neural network models which use deep learning algorithms to analyze unstructured graph data have attracted much attention.The models have powerful expression ability and show excellent performance in some medical tasks.However,when using medical data to train graph models,the local graph nodes will inevitably cause privacy leakage due to carrying personal sensitive information of patients.Therefore,it’s essential to study the privacy protection methods of graph models for medical data.As a key technique of privacy computing,differential privacy can assign similar probabilities to the output of two different input values,thus confusing privacy inference attacks based on background knowledge from adversaries.Moreover,differential privacy is mathematically guaranteed not to reveal personal privacy,while valuable statistical results can still be obtained from noisy training data.Focusing on the privacy leakage problem of medical diagnosis in the stage of data publishing and data mining,the thesis proposed two solutions based on differential privacy.The main research contents and contributions are summarized as follows:(1)Extending the application scenarios of differential privacy,and realizing differential privacy protection for unstructured prescription data in medical diagnosis for the first time.Firstly,the thesis investigates,analyzes and summarizes the application status and implementation mechanisms of differential privacy in the privacy protection life cycle,and this work is based on the journal paper accepted.Then,in data publishing and data mining stages with differential privacy application requirements,successively proposing a differentially private histogram publishing scheme and a medical diagnosis scheme based on locally differentially private graph neural networks,which reduce the probability that graph nodes carry sensitive information of patients when analyzing or sharing biomedical data.(2)Aiming at the privacy leakage problem of personal diagnostic records that traditional histogram publishing algorithms are easy to cause when releasing medical diagnostic statistical records,the thesis takes an intuitive histogram publishing algorithm based on differential privacy for example to compare the distribution state and total error of histogram statistical data under different privacy budgets.The experimental results show the relationship between privacy budgets and data utility intuitively.(3)Aiming at the node-level privacy problem caused by the collection of personal sensitive information through graph nodes during analysis of medical diagnostic prescription data and pathological classification of patients by server-side model,on the one hand,the thesis studies the method of constructing homogeneous graphs of medical diagnostic prescription and takes into account patient history,pathology and first diagnosis prescription information,so as to avoid the needs for misdiagnosis according to reality.On the other hand,the thesis uses the locally private graph neural networks(LPGNN)model to conduct graph node classification experiments on five real homogenous datasets,focusing on the graph neural network medical diagnosis scheme with node-level local differential privacy protection.Experimental results show that LPGNN is robust to both feature perturbation and label perturbation.For the first diagnosis prescription dataset for medical diagnosis,even if the privacy budget of features or labels is appropriately reduced,the accuracy loss can still be controlled within 10%.In addition,the model still performs better than cross-entropy or forward correction methods on test set after training and validation with noisy features and labels.Therefore,for the medical diagnosis pathological classification tasks in this study,LPGNN can achieve a trade-off between the privacy protection degree of the sensitive data and the accuracy of the model. |