Font Size: a A A

Chinese Text Zero-watermarking Technique Based On Statistics Of Part-of-speech

Posted on:2013-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J ShuFull Text:PDF
GTID:2248330395485243Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and internet technologies, texts,which were regarded as an important media for information exchange, hadbeen used in storages and exchanges lots of thoughts of people. Thecharacteristics of digital texts are easily copied and spread, which made thecopyrights of the texts face a big challenge. Watermarking technology is amethod for copyright protection, and it has become a hot topic.Traditional watermarking algorithms modify the contents of the digitalmedium to be protected by embedding a watermark. Zero-watermarkingtechnology defers from traditional watermarking technologies. It did notchange any part of the original carrier media. By picking up the characteristicsof the carrier media, zero-watermarking is constructed by these characteristics,and be registered in Certification Center as copyright before publishing. Sincethe zero-watermarking does not modify any part or properties of the carrier,the imperceptibility is assured. In this paper, the Chinese text of thezero-watermarking is researched. In-depth analysis of the existing textwatermarking algorithm, and combined with the current natural languageprocessing technology, the paper is organized as follows:It introduces the concept of digital text watermarking, characteristics andclassification, then analyzes the current text watermarking algorithms and theiradvantages and disadvantages respectively, finally summarized some problemsin the text watermarking.Aiming at the problems of difficult embedding of Chinese text digitalwatermarking, robustness and lack of watermarking capacity, this paperpresents two different Chinese zero-watermarking algorithms. The one isbased on the frequency of Part-of-Speech. Through combining the naturallanguage processing technology, this algorithm statistics all kinds ofPart-of-Speech to determine the intermediate frequency of Part-of-Speech,then find out the words belong to the intermediate frequency of Part-of-Speechas text features to construct watermarking. The other is a zero-Watermarkingalgorithm based on the information entropy of the text, it calculates theprobability of Part-of-Speech by the frequency of Part-of-Speech, with theprobability to determine the information entropy of the different words, and then determine the information entropy of each sentence in the text. We electthe sentences whose information entropies are greater than the threshold valueand extract the core words as the feature information. Experimental resultsindicate: the two algorithms can not only resist the formats attacks such asdelete spaces, but also can resist the content attack such as increase, delete,synonyms transformation and syntactic transformation and so on.
Keywords/Search Tags:Text Zero-Watermarking, Part-of-Speech, information entropy, correlation function, robustness
PDF Full Text Request
Related items