Font Size: a A A

Research On Privacy Protection Of Chinese Medical Text Generation Task Based On Deep Learning

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y X JieFull Text:PDF
GTID:2544306932455384Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
The Chinese medical text generation task holds substantial importance in enhancing healthcare services and disseminating medical knowledge.However,it confronts distinct challenges compared with conventional text generation tasks.Firstly,the need for medical text data,compounded by stringent legal and regulatory constraints,creates a significant hurdle in training superior language models.Secondly,the generated output must ensure interpretability and accuracy while not compromising expressive capacity for privacy protection.Thirdly,privacy protection in medical text generation assumes higher criticality than routine tasks such as chat and translation.This is attributable to data sensitivity,legal and ethical considerations,and ease of filtering private information from medical text.This calls for a stricter approach during the training and inference stages to guard against potential malicious attackers.To address these issues,we present a novel solution.We begin with a pre-trained model trained from extensive Chinese corpora,followed by fine-tuning step using medical knowledge corpora.This process intends to augment the model’s expressive power in the medical domain.Moreover,our strategy utilizes multi-party secure computation to allow several participants to supply training data while preserving privacy.This significantly mitigates the obstacles posed by the scarcity of medical data.We present a comprehensive analysis of the privacy attack model in medical text generation tasks to address privacy protection issues,demonstrating its threat to privacy and security.We propose an advanced attack method for the model inversion attacks during the inference stage.At the training stage,we construct and implement a multiparty secure computation protocol for the Transformer-based model to ensure training confidentiality.We deploy Intel SGX to guarantee the integrity of the training process.As for the inference stage,we address a selective differential privacy optimizer and a selective differential privacy decoding algorithm for the Transformer-based model.This deters malicious attackers from accessing or inferencing private training data,concurrently ensuring the accuracy and interpretability of the generated outcomes.Given the ease of filtering private information from medical text,deploying selective differential privacy yields considerable benefits.Furthermore,we introduce a new metric-the"medical text generation scientific index" to assess the scientific and the accuracy of the generated medical text.We validated this index through rigorous experimentation,which substantiates the scientific robustness of our model.This thesis represents a comprehensive exploration of privacy issues on Chinese medical text generation tasks,providing novel solutions simultaneously.This accomplishment extends the theoretical comprehension of privacy protection issues in medical text generation tasks and provides practical,effective strategies and tools for privacy protection within this domain.
Keywords/Search Tags:Natural Language Processing, Medical Text Generation, Differential Privacy, Multi-Party Secure Computation, Privacy-preserving Deep Learning
PDF Full Text Request
Related items