Research on Text Steganography Based on Word Frequency Distribution
|Keywords||Information Hiding Steganalysis Synonymy substitution Capacity|
This paper research on text information hiding technology, design a kind ofinformation hiding methods which can resist the text synonym frequency steganalysis,and propose an improved text information hiding method which improve theembedding capacityThe steganography based on synonymy substitution made the frequency ofsynonymy in text changed, breaking the cover statistical characteristic. Andinformation hiding detection technology can took advantage of this change foranalysis, threatening the safety of secret information. Therefore, a method is proposedto deal with this disadvantage. Firstly, preprocessed synonym database according tosynonym natural frequency, studied the word frequency distribution law of thesynonyms in the text, and puts forward the new concept of continuous dimensions.Then combined the synonym word frequency distribution and multiple-baseexpressions, and the secret data was encoded dynamically based on a synonym groupword size. Finally, try to keep the text synonym word frequency statistics feature, theuse of a continuous dimension marked secret information through the text synonymswap to change the size of the continuous dimension, so as to embed secretinformation. Experimental results show that the method achieve good imperceptibilityand resistance when attacked by steganalysis using the statistics characteristics of textsynonym word frequency.In order to keeping modify quantity of original data unchanged and the wordfrequency of the synonyms in the text constant on the one hand, and increasing the bitof embedded secret information and maintaining its original features on the otherhand, the paper proposed an improved algorithm which is an effective method toimprove capacity and concealment. First, further studied the distribution pattern ofthe synonyms in the text, with synonyms group as the unit, propose the definitions ofthe synonym vector, the synonym vector number of combinations, and the synonymvector status values. Then, based on the synonyms vector distribution in text andpermutation and combination method in mathematics, with synonyms group as theunit, word frequency distribution as the research object, the secret information isencoded dynamically using combination number of a synonym group vector as thebase. Finally, using synonyms vector status value to tag the secret information, through swap synonyms in the text, change the status of synonyms vector value, so asto realize the embedding of secret information. This method improves the capacity ofinformation hiding, while maintaining good detection of resistance at the same time.