Font Size: a A A

Design And Research Of Digital Watermarking In Natural Language Documents

Posted on:2010-03-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z S YuFull Text:PDF
GTID:1118360302463034Subject:Information security
Abstract/Summary:PDF Full Text Request
Natural language is the most primary, the most exact, and the most efficient way of human communication. With the development of digital technique, people meet lots of electronic documents, netnews, forums, blogs, and so on. Digital natural language documents have became the most important media over the Internet. How to protect the copyright of these digital documents is an urgent problem.Digital watermarking is an important way to protect the copyright of digital files. Research in this area first develops in multimedia area. Making use of the disadvantages of human vision system and human auditory system, researchers have designed watermarking algorithms for image, audio and video. Due to the similarity of these multimedia carriers in processing and their sufficient redundancy, research in designing watermarking develops rapidly, and research on steganalysis of these schemes has received enough attention.By contrast, owing to special processing methods, low redundancy, complexity of natural language rules, and limitation of computer linguistics, research on watermarking in digital text starts late and gains less achievement. However, text is common and important in our daily life, more and more researchers investigate into this area in recent years. New watermarking algorithms emerge from formatting kind, syntactic kind to semantic kind. Meanwhile, steganalysis on text watermarking has already started. Generally speaking, in the area of digital watermarking in natural language text, application-proper schemes haven't been designed yet, results in steganalysis are still rare, and the theoretic basis is waiting to be established. With this concern, the main research work and the corresponding contributions of this dissertation are as follows:1. Research on model for digital watermarking in natural language text. We establish communication model especially for text, use the methodology of foundations of the cryptography to define the concepts of undetectability, procedure adversary, human adversary, invisible attack and robustness. Also, we find out an approach to prove the safety of watermarking algorithms by interactive prove systems. And we use these to evaluate some actual watermarking systems.2. Design of watermarking schemes for digital natural language text. We propose and realize a new digital text watermarking system– StegCi. It is an appending watermarking scheme. A piece of Ci is produced from watermark by the encoding algorithm. The generated Ci is accord with some tune in number of lines and words, sentence patterns, rhythm and rhyme, so it is innocuous. Stego Ci is then added to the carrier text. During verification, watermark is extracted from the stego Ci by looking up a lexicon. Because stego Cis are innocuous, watermarking is difficult to detect. Experimental result show that the ratio of watermark to carrier reached 16%, which means StegCi is also a high embedding ratio text steganography system. To the best of our knowledge, this is the first text watermarking scheme making use of special type of literature.3. Detection of watermarking schemes for digital natural language texts. For algorithm Snow which belongs to the class of formatting methods, we design detection algorithm and point out the general way to steganalyze formatting schemes. For synonym substitution based schemes which fall into semantic kind, we design detecting algorithm by making use of the context information. By investigating whether the keyword is the most suitable word for the context in its synonym set, judgement of whether this keyword is carrying watermarking bit is made. The investigation over the whole text leads to the final judgement about watermarked or not. When comparing between words in a synonym set for the same context, we use IDF to balance common words and rare ones. Experimental results for T-Lex watermarking system show 90% accuracy, 86.6% precision and 82.5% recall rate. For watermarking system based on translation, we also design detecting algorithm.4. Developing the idea of treating the whole Internet as a corpus. If each webpage which contains natural language texts is treated as a document in this corpus, the whole Internet can be regarded as a large-scale, influence-weighted, up-to-date corpus. With the help of tools such as searching engine, people may get useful information about the usage of natural language which is very difficult to get from traditional corpra because of their limited size or unaffordable cost.
Keywords/Search Tags:digital watermarking, natural language text, information hiding, steganography, steganalysis, copyright protection, detection of watermarking, reduction of watermarking, watermarking model, Ci
PDF Full Text Request
Related items