Font Size: a A A

The Research On Natural Language Information Hiding

Posted on:2009-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L LiuFull Text:PDF
GTID:1118360272492137Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of computer and Internet technology, information hiding has become one of the hot spots in the field of information security, and has been extensively used for copyright protection, covert communication, authentication, etc. At present, most have focused on information hiding of video, image and audio documents. However, digital texts form one of the largest chunk of digital data people encounter daily, thus covert communication, copyright management and authentication for text documents are more serious than they are for video, image, and audio documents.Comparing with other media documents, such as image, audio and video, text documents lack redundancies of the human visual system and human auditory system. Additionally, there are few of strong theories and practical automatic techniques in natural language processing area to understand, transform and generate texts. Thus the research of text steganography is very challenging. The early methods of text steganography are based on the physical format of texts. Due to those methods exploited tolerances in typesetting by making minute changes in line placement and kerning, making them vulnerable to simple reformatting and OCR (short for Optical Character Recognition) attacks, their applications are limited. Natural language steganography, as a new area, directs the text steganography.This dissertation mainly concerns about Chinese texts, and proposes several methods for natural language steganography on word level, sentence level and paragraph level. Additionally, due to the limit of the amount of hidden information and the sensitivity of modifying a given cover text, a new method based on Mimic is proposed. The main contributions are summarized as follows.(1) According to characteristics of Chinese texts, two methods on the word level are proposed. The first method exploits the substitution of variant forms of the same word and synonyms. In the method, the neighboring words are deemed as context words. When substituting, a Chinese morphological analyzer is introduced to evaluate whether the text is correctly segmented. The method is easy to implement. It can achieve a high degree of capacity and resist machine analysis. The second method is substitution of synonyms based on the semantic adjacent words. Firstly, the synonymy sets are created and classified with HowNet and Tongyicicilin. For the non-totally interchangeable synonymy sets, the context words are obtained from the semantic adjacent words by analyzing the dependency relationships, and then the synonym is selected with high probability of its cooccurrence of the semantic adjacent words. The method can effectively obtain the context words, and avoid the improper substitutions.(2) As present work on natural language steganography on the sentence level is mainly designed for English texts, this dissertation proposes two methods on the sentence level of Chinese texts. The first method is based on the transformation of syntactic parser trees. Firstly, a parser based on BP neural network is designed and implemented. And then, all the syntactic parser trees are encoded. Then, secret information is embedded by modifying the trees according to the transformation rules. The second method is based on shift conversion. Firstly, a method based on Chinese mathematical expression is presented to encode Chinese texts. Then, secret information is embedded according to the shift conversion rules.(3) Presently, there is little work on natural language steganography on the paragraph level. This dissertation proposes a Chinese natural language watermarking method on the paragraph level. The method is based on named entity and coreference resolution. Additionally, the spread spectrum technique is introduced to encode the watermark. The experimental results show that the method is robust, and can resist some active attacks.(4) For the existing text mimicking methods, it is necessary for the communication parties to share the dictionary and sentence templates. Additionally, the generated texts are easy to incur suspicion. This dissertation proposes a new method of natural language steganography based on Mimic. The method needs not construct sophisticated dictionaries and sentence templates beforehand. Moreover, it can improve the efficiency and security of transmitting secret information. A tool, called MIMIC-PPT, is implemented by combining text mimicking techniques with characteristics of PPT documents.
Keywords/Search Tags:Information hiding, Text steganography, Natural language steganography, Natural language watermarking, Covert communication, Copyright protection
PDF Full Text Request
Related items