I improved on my steganographic tool I wrote previously to include AES encryption and tested files produced by the new tool. No surprise, file appended steganography is pretty ineffective but I had a fun time writing about it.
Steganography has evolved over time with its origins rooted in ancient Greece and is applicable even in the modern computer age. Carrier files can have embedded secret data that is hidden in a number of different steganographic methods. The file appending technique is one of the more basic methods and is not widely used. This method is modified to include cryptography by applying AES 256 to the appended secret data.
This paper presents research that by applying the most modern and robust cryptographic algorithm to one of the arguably least sophisticated steganographic methods will possibly produce a viable stego application that will pass light steganalysis. Two tools are created to carry out this task; a Stegoencrypt and Stegodecrypt application. The Stegoencrypt application encrypts secret data with AES 256 and appends it to the end of a carrier file with the initialization vector acting as the identifier for the start of the secret data. The Stegodecrypt application extracts and decrypts the secret data from a given stego file and create an output file. Several combinations of carrier/secret files are passed through the Stegoencrypt application to produce stego files. These stego files are then analyzed using several techniques and steganalysis tools to determine the effectiveness of each combination and the steganographic technique.
The act of information hiding has arguably been practiced since the early forms of long distance communication. Before modern day computing and Cryptography, messages were transmitted by foot by human messengers. In order to obfuscate the intended message, necessary steps were taken in order to hide it. Methods in this earlier time may have included having the messenger memorize the message or even physically hide it on the messenger themselves. The ancient Greek historian, Herodotus noted that the tyrant Histiaeus was known to shave the heads of slaves and tattoo messages on the scalp. After growing the hair back and hiding the message, the messenger was sent to the intended recipient. Once arriving, the slave was then shaved and the message was revealed (Thampi, 2004). Information hiding has significantly evolved since this time to produce and further develop the areas of Cryptography and Steganography.
2 Cryptography vs. Steganography
Within the realm of information hiding, both Cryptography and Steganography rely on covert channels (Katzenbeisser, 2000). A covert channel by nature restricts the access of the contents of the channel to entities regulated by the originator or owner of the channel. This concept is applied to modern day computing when using a Virtual Private Network (VPN), Secure Socket Layer (SSL), Hypertext Transfer Protocol Secure (HTTPS), Secure Shell (SSH), or any other modern forms of secure communication.
Cryptography and Steganography do differ in the approach to implementing a covert channel. Cryptography hides information by relying on mathematical algorithms to manipulate the data so the information is unknown from unintended recipients. Possible observers of the communication channel are aware that the sent message is covert. This approach allows for attackers to identify possible sensitive encrypted data and to apply techniques to reveal the hidden data. An array of methods to secure data is implemented within Cryptography and constantly evolves as attack methods and computing power improve. Opposed to Cryptography, a steganographic approach attempts to conceal the fact that an intended secret message is being sent. A secret message is hidden within a known image and sent over a non-covert channel. Steganography originates from Greek meaning “covered writing”. When applied to a digital message, the secret message is covered or encapsulated in observable and unencrypted message or data (Johnson, 1995). When utilizing Steganography in communication, the sent message is known to all possible observers but extracting the secret message from the sent message is only known to the recipient. The method in which the message is hidden is only known to the recipient, which ensures that only the recipient will be able to extract the secret message.
3 Information Hiding Definitions
3.1 Carrier – A carrier is an unencrypted message, data segment, or file that a secret message or data resides in. In modern day computing a carrier can include files, meta data, network traffic, and other data that can be transmitted from one node to the other.
3.2 Secret – A secret is a message, data segment, or file that is hidden within a carrier. Steganography has evolved to include a significant number of different methods to hide a secret within a carrier.
3.3 Key – A key is a passcode that is used in conjunction with a cryptographic method to encrypt and decrypt a message. A key is chosen by the individual at the time of encryption in the form of a string of characters. It is subsequently used by the recipient to decrypt the message.
3.4 Initialization Vector (IV) – An IV is a series of characters of a given size that is used to initiate an encryption algorithm. In order to prevent the possibility of producing a similar encrypted output, an IV should be random (Not using a random initialization vector with cipher block chaining mode, 2009). An AES 256bit encryption using CBC requires a 128, 192, or 256 bit IV to encrypt data (Frankel, Glenn, & Kelly, 2003).
3.5 Cipher Block Chaining (CBC) – CBC is a mode of encryption using a block cipher in which discrete segments of data during the encryption process are XORed with both plaintext, key and IV to produce a ciphertext that can later be decrypted using the IV and key (Cipher Block Chaining (CBC)).
When applying both Cryptography and Steganography, this is known as Stegocrypt. First, the secret message is encrypted using various forms of cryptography. This encrypted information is then hidden in an unencrypted and known message which is then transmitted on an unencrypted channel. The recipient of the Stegocrypt message should be the only individual that knows that the intended message is hidden within the sent message. Like traditional Steganography the recipient must extract the secret message using a specific method that was used by the sender, but a Stegocrypt message requires a second decryption step to fully uncover the message or data. This may require a passcode to decrypt, or to decode using a specific deciphering method.
5 File Appending Steganography
A simple form of digital Steganography includes appending a secret to the end of a carrier file. This approach is used in the open source tool, AppendX. Once the secret is appended to the carrier file, the carrier and secret are packaged into a single stego file. The recipient of the stego file then extracts the data after the carrier to expose the secret. Because of this rudimentary method of data obfuscation, it is considered to be a weak approach to steganography.
6 StegoCryptoPy Architecture
The objective of this paper is to expand on a simple steganographic method such as File Append to create an application that incorporates both steganography and cryptography to produce an effective stego file that can pass light Steganalysis. The produced file using the proposed Stegocrypt method would not pass the heavy scrutiny of a skilled forensic analyst. Also, this paper is aimed at determining the most effective combination of carrier/secret files that would pass a forensic examination. The proposed application is developed using the Python language, coining the name “StegoCryptoPy” because of the Stegocrypt approach and the language used.
6.1 Stegocrypt Applications Developed
Two applications were developed in order to facilitate the Stegocrypt process. At the time of writing this paper, both applications use a command line interface. This allows for less overhead during development and system resources when executing. One application is used for hiding and encrypting secret data named “Stegoencrypt” and the other used to extract and decrypt secret data is “Stegodecrypt”.
6.2 AES 256bit Encryption
When deciding on the cryptographic algorithm to use for the crypto portion of the Stegocrypt application, several characteristics of the algorithm. The chosen algorithm must have a significantly large key size, large block size, be publically available, efficient, and easy to implement. Advanced Encryption Standard (AES) was chosen because it met all of these requirements. AES is the successor to the Data Encryption Standard (DES) after a four year competition put on by the National Institute of Standards and Technology (NIST) after DES was found to be weak due to the inability to accept larger key sizes. AES is a block chain cipher that can allow presently up to a 256 bit key and block size. When using a 256 bit key, fourteen rounds of the algorithm are applied to the plaintext to produce a ciphertext. (Frankel, Glenn, & Kelly, 2003)
6.3 Language and Dependencies
Both Stegoencrypt and Stegodecrypt tools are written using the Python language. Python is proven to be a powerful and efficient language with various applications. The language allows for easy binary manipulation of files which is necessary for this application to apply the file appending steganographic method. The version of Python that is used is 3.3 because of the compatibility with the other dependencies needed. Python alone does not include any cryptographic libraries so one would have to be installed to carry out the cryptography portion of the Stegocrypt.
A popular cryptographic Python library that is compatible with Python 3.3 is Pycrypto. The latest version of Pycrypto at the time of writing this paper is 2.6. Pycrypto includes a number of Python libraries pertaining to cryptography including various hashing and cipher algorithms. Python also includes the AES 256 encryption algorithm to be used with the Python language. The AES 256 encryption algorithm is imported into both applications by using from Crypto.Cipher import AES. After importing the AES library, the applications can designate the mode for AES as Cipher Block Chaining by using mode = AES.MODE_CBC. To initialize a new encryptor or decryptor, the applications use
encryptor = AES.new(key, mode, IV=IV)
In order to encrypt or decrypt the data, the applications use
secretData = encryptor.encrypt(secretData).
The AES module also has the ability when coupled with the random.new() module to create a string the size of an AES block. This is implemented in order to create a random IV with the correct size in the Stegoencrypt algorithm. This is achieved by using
IV = Random.new().read(AES.block_size)
which produces a random string with a length of 256 bits.
When reading in binary files for manipulation, data at the binary level is represented in hexadecimal characters. The included Python library binascii includes all the necessary modules to handle hexadecimal values. This library allows for Python to convert various value formats to and from hexadecimal. Opening a file as a binary by using with open(file, ‘rb’) as f: and using the read() module by using f.read() returns a string of binary data. This format becomes significant when the stegodecrypt application looks for the IV within the stego file and needs to convert following secret data from ascii hex characters that represent binary data. In order to convert these ascii characters to hex binary, the application uses binascii.a2b_hex() to interpret the ascii characters as hexadecimal and convert accordingly to binary.
As noted previously in section 3.4, AES allows for 128, 192, and 256 bit key size. In order to guarantee that the user input for the designated key will match the maximum key size of 256 bits, both applications need to produce a 256 bit hash value of the key. Producing an exactly 256 bit size key strengthens the AES algorithm to its full potential. The library hashlib library that comes packaged with Python 3.3 includes a SHA256 hashing module that produces a 256 bit hash value regardless of the input string by using hashlib.sha256(). Producing a hash value with a discrete size of the key allows the user to decide on the size and complexity of the password.
A Stegoencrypt tool is used by the sender of the secret file to create a stego file. This tool takes in a given carrier and secret file designated by the user. Both carrier and secret files can be of any file format, but the file names must include file extensions and are case sensitive. The application then prompts the user for an output file name. Once the application creates the stego file, it saves it with the output file name and file extension. The user is then prompted for a key. This will be the password used to decrypt the message on the receiving end. Finally, the application produces the stego file and prints to the screen the 256 bit IV and a message noting the file has been created.
A Stegodecrypt tool is used by the recipient of the stego file to extract the secret file. The user is first prompted for the stego file and output file. The output file is the secret file that is hidden within the carrier. Both stego file and output file must include the file extension and are also case sensitive. The user is then prompted for the key and IV used to decrypt the secret. After extracting and decrypting the secret, the output file is saved using the designated output file name and a confirmation message is printed to the screen.
9 Testing Stegoencrypt and Stegodecrypt
9.1 Test Input Files
In order to test the effectiveness of the Stegocrypt application, stego files produced by the Stegoencrypt application would need to undergo some steganalysis. One of the goals of this research is finding the most effective combination of carrier and secret file that produces a stego file that would pass light steganalysis. In order to produce an effective combination of different files, various test files would need to be created to run through both Stegoencrypt and Stegodecrypt applications. The files created for such purpose are as follows:
Carrier.jpg and Secret.jpg – 40x40px images created with GIMP. Carrier.jpg is an image of a lowercase ‘c’ while Secret.jpg is an image of a lowercase ‘s’.
Carrier.png and Secret.png – 40x40px images created with GIMP. Carrier.png is an image of a lowercase ‘c’ while Secret.png is an image of a lowercase ‘s’.
Carrier.exe and Secret.exe – Both Carrier.exe and Secret.exe are Windows executable files that print “Hello World” to a terminal.
Carrier.pdf and Secret.pdf – One page PDF documents with a lowercase ‘c’ and ‘s’ respectively.
Carrier.wav and Secret.wav – 7:50 minute wave files with a 2822kbps bit rate
Carrier.wmv and Secret.wmv – 50 second Windows movie files with a 6291kbps bit rate
Carrier.txt and Secret.txt – Text files with the phrases “This is the carrier.” and “This is the secret” respectively
Carrier and Secret random files – 1000KB file filled with random data.
9.2 Producing Stego Files and Steganalysis
Pairs of carrier and secret files were fed into the Stegoencrypt application to produce 15 steganographic files. An attempt at executing the stego files was then recorded. If the stego file was able to be executed and if the original carrier content was viewable and unaltered, a successful and useable stego file was produced. A useable stego file is paramount in the application of steganography. The stego image must operate in the manner intended; otherwise raising suspicions that the file may be altered.
The created stego files were then opened in the hex editor for Windows, HxD. A light manual steganalysis is conducted on each stego file to attempt to see any visual discrepancies. Such abnormalities may include obvious irregularities in the file’s structure or additional data after an end of file (EOF) signature. This procedure is to simulate an untrained individual’s curiosity for a given file and having some knowledge of file contents.
|Carrier File||Secret File||Secret Size (B)||Stego File||Executable?||Execute original?||Size Increase (B)||Visual Detect (hxd)||StegDetect||JPEGsnoop (if jpg)|
After a visual analysis of the contents of the stego file, the files were then analyzed using the Linux command line application, stegDetect. This application was chosen primarily because it can detect stego files that the appendX method was used. It serves as a good metric for determining if a file appending with encryption method is more effective than plain appendX. In order to use the application, the command stegdetect [filename] is issued. The application then outputs a confirmation of steganography and a possible method of hiding.
If the stego file is a Jpeg image, the Windows GUI application JPEGsnoop is used to do further analysis. This application crawls through the data of the file and organizes the contents by section. If any extra unidentifiable information is included, it displays these results as well. This application is primarily used to detect steganography and not for extracting secret messages.
10 Results and Conclusion
The steganalysis results of the stego files that were produced using test input files from section 9.1 can be found in Table 1 on page 6. This table notably lists the carrier file, secret file, produced stego file, if the stego was executable, if the stego executed the original carrier content, and if the stego was discovered using the three tools for detection.
All stego files produced were able to be executed after creation. Executing a file is considered to being able to launch whatever the default application is for that file. For example, the Windows image preview application launched successfully when double-clicking on a stego image file. Executable carrier files (Exe) failed this portion of the analysis because the execution process relies on all compiled source within the file. There is no EOF within the exe carrier to stop the process from interpreting the secret data as part of the compiled data. This results in corrupting the original executable and not allowing it to run.
After launching the default application however, some of the stego files did not function in the same manner as the carrier file. This included displaying different content from the original carrier file. As noted in section 9.2, a corrupt stego file may raise suspicions when being handled by other users. This would be considered an unusable stego file. The carrier/secret file combinations that did not pass this portion included Exe/Txt, Txt/Exe, Txt/Jpg, Txt/Pdf, Txt/Random, Txt/Wav, and Txt/Wmv. Text editors display all detected ASCII symbols, so ASCII characters found in the appended data is also displayed in the document.
A visual examination of the stego files determined if the files showed visual signs of manipulation at the binary level. Since the steganographic method was known. Characteristics at the end of each file were examined for irregularities. Files that did not pass this portion of analysis had a file structure that had a clear EOF signature in which was followed by the secret data. The most unnoticeable combination was the Rand/Txt combination. The random data provided sufficient noise which made a manual visual analysis nearly impossible.
The StegDetect steganalysis tool was used primarily because of the featured detection of AppendX. The application did detect file appending steganography in stego files using a Jpg carrier. However, it did not detect a stego file in any other tested format. It could be possible that StegDetect can only detect Jpg file signatures. The JPEGsnoop steganalysis tool was further used on stego files using a Jpg carrier. It did not detect steganography in all three files.
After testing multiple combinations of file formats, a file appending technique is overall not recommended. However, there are cases of file format combinations that prove to be very effective in hiding a secret file using this technique. The most effective approach to using a file appending Stegocrypt technique is using a file with random data as the carrier. This proves to provide enough noise so that a manual or automated steganalysis will be compromised. To further obscure the carrier file, an unidentifiable file extension is recommended and to possibly include the stego file amongst other random files.
#!C:\Python33\python.exe import binascii from Crypto.Cipher import AES from Crypto import Random import hashlib secretFile = input('Enter secret file name: ') carrierFile = input('Enter carrier file name: ') fileOut = input('Enter output file name: ') password = input('Enter the key: ').encode('utf-8') key = hashlib.sha256(password).digest() IV = Random.new().read(AES.block_size) secretSig = IV print('IV: ', binascii.hexlify(IV).decode('utf-8')) mode = AES.MODE_CBC encryptor = AES.new(key, mode, IV=IV) with open(secretFile, 'rb') as s: #secretfile secretData = s.read() if len(secretData) % 16 != 0: #pad with spaces if too small secretData += bytes(([0x20]) * (16 - len(secretData) % 16)) secretData = encryptor.encrypt(secretData) with open(carrierFile, 'rb') as c: #carrier carrierData = c.read() target = open(fileOut, 'wb') target.write(carrierData) target.write(secretSig) target.write(secretData) target.close() pause = input('File done. Press enter.')
#!C:\Python33\python.exe import binascii import hashlib from Crypto.Cipher import AES stegoFile = input('Enter the stego file name: ') fileOut = input('Enter the output file name: ') password = input('Enter the key: ').encode('utf-8') IV = input('Enter the IV: ') mode = AES.MODE_CBC key = hashlib.sha256(password).digest() IV = (binascii.hexlify(binascii.a2b_hex(IV)).upper()) sigLen = len(IV) with open(stegoFile, 'rb') as s: content = s.read() stegoData = (binascii.hexlify(content)).upper() stegoLen = len(stegoData) for i in range(0, len(stegoData)): if(stegoData[i:i+sigLen] == IV): secretMsg = binascii.a2b_hex(stegoData[(i+sigLen):stegoLen]) decryptor = AES.new(key, mode, IV=(binascii.a2b_hex(IV))) plain = decryptor.decrypt(secretMsg) target = open(fileOut, 'wb') target.write(plain) target.close() print(fileOut, 'file created.') input() break
Cipher Block Chaining (CBC). (n.d.). Retrieved April 20, 2016, from SearchSecurity: http://searchsecurity.techtarget.com/definition/cipher-block-chaining
Frankel, S., Glenn, R., & Kelly, S. (2003, September). The AES-CBC Cipher Algorithm and Its Use with IPsec. The Internet Society.
Johnson, N. F. (1995, November). Steganography. Retrieved April 20, 2016, from Johnson & Johnson Technology Consultants: http://www.jjtc.com/pub/tr_95_11_nfj/sec201.html
Katzenbeisser, S. (2000). Information Hiding Techniques for Steganography and Digital. Norwood: Library of Congress Cataloging-in-Publication Data.
Not using a random initialization vector with cipher block chaining mode. (2009, Februrary 27). Retrieved April 20, 2016, from OWASP: https://www.owasp.org/index.php/Not_using_a_random_initialization_vector_with_cipher_block_chaining_mode
Thampi, S. M. (2004). Information Hiding Techniques: A Tutorial Review. ISTE-STTP on Network Security & Cryptography, LBSCE 2004 (pp. 1-19). Kasaragod: LBS College of Engineering.