Font Size: a A A

The Method Of Information Hiding Based On Automatically Generating Text

Posted on:2021-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:S Y JinFull Text:PDF
GTID:2428330605456050Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,protecting the information security of users has become a research hotspot.Information hiding is one of the important technologies in the field of information security.The technology embeds the secret information into the data transmitted in the open channel,which is hard to be detected by the attacker.Because the text has high coding and is the most widely used carrier in people's daily communication,text-based information hiding attracts researchers' attention.Benefiting from the rapid development of deep learning neural network,the combination of neural network and information hiding becomes an important branch of exploring new methods.Based on the above research background,the main work of this thesis is as follows:First,preprocess data set.Neural network model relies on its strong self-learning ability,so it is necessary to build a data set with a large number of natural texts.This thesis uses the comment text data set “Imdb”,formal text data set “News” and informal text data set“Twitter” as the training set of neural network,in order to build the model with suitable parameter.Data set preprocessing includes replacing the case of letters,deleting special symbols and filtering web links to eliminate the interference factors in the process of automatically text generation.Then,design the algorithm of embedding and extracting secret information.In this thesis,statistical language model Markov model and neural network model recurrent neural network are selected as the text generation models for comparative analysis.Markov model has an n-gram structure matching the language generation process.Recurrent neural network can map text to high-dimensional semantic space,extracts and analyzes features through the similarities and differences of semantic distribution,which has a good effect on text generation.Both of them can build candidate pool in the process of generating,encode the words by Huffman coding and generate steganographic text according to the secret information bit streams,so as to realize the embedding of secret information.In secret information extraction,the sender and the receiver need to have the same training set or generation model,and extract the secret information contained in the text according to the actual received current sending words.The experenmantal results show that the informationhiding method based on text automatically generation proposed is feasible and effective,and the generated steganographic text is semantic coherent and has highly concealment.The steganographic text generated based on recurrent neural network is better in concealment,which is suitable for informal texts and steganographic text generated based on Markov model has higher hiding capacity,which is suitable for formal texts.Finally,propose the candidate pool self-shrinking mechanism.Considering that different words have different sensitivity,the improved method introduces perplexity calculation in the process of building candidate pool.Before Huffman coding,the method calculates the perplexity of each word in the initial candidate pool with the previous generated words,and filter the pool according to the preset perplexity value;the filtered words can form a new candidate pool,and then encode words by Huffman coding.The experenmatal results show that the improved method can generate smoother steganographic text,but at the expense of smaller hiding capacity.
Keywords/Search Tags:Text information hiding, Markov model, Recurrent neural network, Text automatical generation, Huffman coding
PDF Full Text Request
Related items