Font Size: a A A

Research On Steganography Based On Generative Text

Posted on:2024-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2568307067991519Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,information security has received more and more attention.The maturity of natural language processing technology and the im-provement of hardware devices have made generative linguistic steganography models a new research hotspot.At present,linguistic steganography algorithms mainly face problems such as low quality of generated text,small embedding capacity and large differences with natural text statistical distribution,but with the advancement of lan-guage model technology and the improvement of a series of conditional probability mapping coding,the above problems have been continuously improved.However,at present,some natural language processing technologies are still insufficient,not suitable for discrete data such as text,and further research is needed to analyze the connection between various goals and find better ways to achieve a balance under different demand goals.Since the traditional generative adversative network(GAN)is not suitable for text data and is more suitable for text generation when combined with large language models,this paper proposes a TransGAN model for generative text steganization,which uses GPT-2 model for generator and BERT model for discriminator.At the same time,a new loss function is designed to optimize the generator,and the loss function proposed in this paper retains the advantages of MaliGAN,which not only allows the model to continue to find the global optimal solution,but also appropriately reduces the accuracy influence caused by discrete variables.In the steganography part,this paper performs arithmetic encoding of words based on conditional probability distribution,compares and finds that dynamic arithmetic coding is superior to static arithmetic coding,and theoretically proves the imperceptibility and data compression invariance of arithmetic coding.Finally,according to the experimental results,the proposed model is better than the steganography model based on generative adversarial network(GAN)in terms of KL divergence and embedding capacity,and the difference in perplexity is small,and a safer steganography method is realized.
Keywords/Search Tags:Linguistic steganography, Natural language processing, TransGAN, GPT-2, BERT
PDF Full Text Request
Related items