| With the continuous development of Io T technology,diverse Io T applications are emerging and serving people’s daily life.However,there is cross-domain data exchange between different Io T applications,which can easily cause privacy and sensitive information leakage,thus undermining data security.In view of the leakage of sensitive information in the process of data sharing and exchange,this research studies the data desensitization technology for the cross-domain exchange of Io T data,proposes the method of unstructured data and structured data desensitization based on deep learning technology,and designs the data crossdomain exchange system for Io T applications.The research work mainly includes the following three contents:(1)Aiming at the difficulty in identifying sensitive information in text data desensitization,an unstructured data desensitization method based on BERT-PN model is proposed.The model uses pre-trained BERT for feature extraction and uses pointer network PN instead of CRF for decoding,which can make full use of training data.In order to alleviate the problem of unbalanced sample distribution in sensitive entity recognition tasks,BERT-PN uses an improved loss function and combines label smoothing to avoid model overfitting.In addition,the robustness and generalization ability of the model are improved by adversarial training.The experiment results show that the model has a good recognition effect on all three datasets,and has certain advantages compared with the existing models.After identifying sensitive entities in text data by the BERT-PN model,a personalized data desensitization method for sensitive entities is adopted considering the different sensitivity degree of different entities.This can not only provide good desensitization effect,but also retain a certain value of data use.(2)Aiming at the problem that traditional structured data desensitization methods based on anonymization or disturbance cannot balance data security and availability,a structured data desensitization method based on Improved Conditional Tabular GAN(ICTGAN)is proposed.ICTGAN adopts CTGAN’s mode-specific normalization and sampling training methods to deal with mixed distribution and unbalanced tabular data,and adds an auxiliary classifier to CTAGN’s network to improve the semantic correctness of synthetic data.In order to solve the problem of the balance between security and availability of desensitized data,ICTGAN improves the loss function and controls the degree of data desensitization through corresponding threshold items.In addition,adaptive training method is adopted to fully train the model.The experiment results show that synthetic data has better data utility and privacy protection.In terms of data availability,synthetic data can replace raw data for machine learning tasks.In terms of data security,this model can provide a certain degree of privacy protection,and can control the degree of data desensitization according to requirements,so as to realize the adjustment between usability and security.(3)Based on the above research,a cross-domain data exchange system for Io T applications is designed and implemented.The system has functions such as data desensitization,data normalization and data on demand.The function of the system is tested in the actual Io T application environment,and the results show that the designed system can ensure the safe cross-domain sharing of data between different Io T applications. |