Font Size: a A A

Research On Text Steganography And Steganalysis

Posted on:2012-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y XiangFull Text:PDF
GTID:1228330395485622Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information hiding is an important research hotspot in the field of information security, which has been widely used for covert communication, storage and transmission of confidential information, copyright protection of digital media, etc. However, information hiding techniques may also be abused to transmit illegal or malicious information, which could lead to incalculable losses for the nation, society, and the people. Therefore, it is necessary to study the techniques for detecting hidden information to prevent and destroy the illegally transmitted secret information. In addition, detection of hidden information, also called steganalysis, can promote the development of information hiding and provide criteria for measuring the security of an information hiding system. Therefore, research on information hiding and steganalysis is meaningful and significant for maintaining information security.Text steganography takes text data as the carrier and embeds secret information by making the use of the information redundancies in the format, storage structure and linguistic characteristics. As an opponent to text steganography, text steganalysis aims to detect the existence of the hidden information in the text. In this thesis, we concentrate on steganalysis for the common text information hiding methods on format, vocabulary, sentence level. On the other hand, more secure text steganographic algorithms are proposed by improving the embedding efficiency and preserving statistical characteristics to enhance the resistance ability for steganalysis attacks. The main contributions of the thesis are presented as follows:(1) For the information hiding method based on character format, three types of statistical features are proposed to design steganalysis algorithms. Considering that character-format-based steganography would cause variations between format attribute values of adjacent characters, changes of the character-run length, and abnormal changes of the format attributes of characters with semantic dependency, three types of statistical features are extracted, which are taken as the inputs of the support vector machine for classifying the cover and stego documents. Experimental results show that these three types of statistical features have different advantages on the detection reliability and generality, and can provide high detection accuracy.(2) A text steganalysis method using the features derived from synonym frequency is proposed to detect synonym substitution-based steganography. First, the attribute pair is introduced and expressed as an ordered pair to represent the position of a word in a frequency-descending ordered synonym set and the number of its synonyms. As a result of the synonym substitutions, the amount of high frequency attribute pairs may be reduced while the amount of low frequency attribute pairs would be increased. By theoretically analyzing the relationship between the changes of the probability distributions of attribute pairs and embedding rate, a feature vector based on the difference of the relative probabilities of different attribute pairs is utilized to detect the secret information. Moreover, the impact caused by synonym coding strategies is theoretically analyzed. Experimental results demonstrate that the proposed method has high detection probability, and achieves better detection performance than existing methods.(3) A steganalysis method using statistical features from the differences among semantically equivalent syntactic structures is proposed to detect the presence of secret information embedded by syntactic-transformation-based steganography. The syntactic transformations caused the difference of the occurrence frequency among different semantically equivalent syntactic structures vary between the cover texts and stego texts. According to the transition matrix caused by syntactic transformation, a higher-order statistical model is built. Consequently, some statistical features are deduced from the variations of syntactic structures’higher-order statistics after syntactic transformation. The experimental results show that the proposed method provides high detection accuracy for various embedding rates and texts of different genres.(4) In order to improve the embedding efficiency of text steganography, an algorithm of constructing the parity check matrix for the q-ary linear code is proposed for text steganography. The text steganography adapts the used q-ary linear code according to the length of secret information and embedding capacity. Compared with the general matrix embedding, the proposed algorithm can achieves a higher embedding efficiency when the embedding rate is low. The experimental results show that the constructed check matrix can lead the embedding efficiency of steganography close to the theoretical upper bound. Moreover, an extended block coding is proposed to be used in text steganography based on additional noise. The method encodes the secret information vector as the cover data vector with smaller weight, leading to the embedding efficiency achieving to the theoretical upper bound of the embedding method with linear codes. Experimental results show that using the extended block coding, the probability of distinguishing the additional-noise-based stego texts from cover texts is greatly reduced. (5) In order to preserve the statistical characteristics of the cover texts, a multiple choice question (MCQ) based secure steganography is presented. With the guidance of Cachin’s information theoretic model for steganography, the special text-multiple choice questions with independent identically distributed features is chosen to conceal information. The proposed method encodes each MCQ and its options utilizing the randomness of different MCQs and the option order. A series of MCQs from a MCQ bank are automatically selected by secret information to generate a stego text, and their options are then reordered to embed more information. Experimental results show that the proposed algorithm has excellent imperceptibility, considerably high embedding bit rate and high resistance of steganalysis attacks.
Keywords/Search Tags:Text, Information hiding, Steganography, Steganalysis, Detection ofhidden information, Linguistic steganography
PDF Full Text Request
Related items