Font Size: a A A

Research And Implementation Of Text Automatic Summarization Based On Deep Learning

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:M M ZhangFull Text:PDF
GTID:2416330614470326Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Data desensitization is an important means of personal privacy data protection,which can eliminate the risk of personal privacy information being stolen due to data leakage from the source.From the structural characteristics of massive data,data is mainly divided into structured table data and unstructured text data.Some of the existing structured data desensitization methods may be heavy and complex,such as algorithm encryption desensitization;some may be simple and crude,such as truncation screening desensitization,and the existing desensitization methods can not ensure that the desensitized data remains the original data value.However,the research on the desensitization of unstructured data is scarce,and most of them are still in the way of rule matching and database field association.In order to improve the above deficiencies,this paper has made a survey and Research on data desensitization,the main research contents are as follows:(1)Research on desensitization of structured data.According to the intrinsic value and relationship characteristics of different data,a group of desensitization algorithms are proposed to meet the needs of various desensitization,and hash operation and feature confusion code are added to the algorithm to improve the security and uniqueness of desensitization results,so as to prevent the source data from being inverted from desensitization results.In the algorithm,the data characteristics of the source data are used as the feature parameters to participate in the operation,so that the desensitization results can ensure the intrinsic value of the source data.Through a large number of judicial data desensitization experiments,the experimental results prove the security,consistency and uniqueness of desensitization results,and the desensitization results also retain the intrinsic value of the original data.(2)Research on desensitization of unstructured text data.In order to solve the problem that sensitive information is difficult to locate in unstructured text data,named entity recognition technology is introduced to identify the boundary of sensitive data.In the past two years,the Bert pre training model just proposed has been added to the traditional Bi LSTM + CRF model of named entity recognition,replacing the word2 vec word vector in the original model,using a large number of dimensionless judicial text data to fine tune the Bert model,collecting,sorting and labeling training materials to train the model,improving the quality of named entity recognition.Combined with some rule-based methods and judicial data desensitization white list,we can improve the desensitization efficiency and ensure the flow of desensitized text.The experimental results show that this method achieves the desensitization effect of text data and maintains a good reading experience.(3)The application of structured data desensitization and unstructured data desensitization in s city big data governance and big data use respectively.Design and implement structured data desensitization system,and integrate the system into S city data governance system to desensitize data in the process of data governance;provide text data desensitization interface services for various big data applications in S City,and desensitize text data in each application.The simple application of data desensitization proves the feasibility of the method.
Keywords/Search Tags:Data protection, Desensitization of structured data, Retain data value, NER, Desensitization of unstructured data
PDF Full Text Request
Related items