In attempting to learn Python, I’ve been trying to find any reason to implement the language into any project I could think of. I thought it would be a great learning experience to incorporate Python in a Digital Forensics setting to automate finding file signatures within a hex file. The test hex file represents a hard drive image and the script I wrote could be used for an actual image file.
The script takes in two files, first being a CSV file that includes all the file signature values we are looking for. This file is in the following format: File Signature, File Extension. The second file is a hex file that we will be searching for said files.
I created a test hex file using HxD by inserting 1024 Bytes of random data (Edit -> Insert bytes…) and then inserting a few file signatures in various locations throughout the file. I saved the file as “test” without a file extension in the same directory as the Python script and CSV signature file.
When the script is ran, it opens the hex file to be read as a binary and creates a single string converted to uppercase. It also opens the signatures file and create an array of all the signatures and their extensions, splitting the elements by commas.
I incorporated somewhat of a progress bar to print out as the script is working. It’s not pretty, but it gives you an idea of where the found signatures reside in the file. When a signature if found, it displays on screen what percentage and Byte offset it is from the beginning of the file.
File Signature File:
25504446,PDF 504B030414,MS_Office 146674797071742020,MOV 186674797033677035,MP4 18667479706D703432,M4V 100005374616E64617264204A6574204442,MS_ACCESS 100080001000101,IMG 6E1EF0,PPT 908100000060500,XLS 3026B2758E66CF11A6D900AA0062CE6C,WMV 38425053,PSD 3C3F786D6C2076657273696F6E3D22312E30223F3E,XML 4344303031,ISO 474946383761,GIF 474946383961,GIF 494433,MP3 4C00000001140200,LNK 504B0304,ZIP 504B0304140008000800,JAR 57696E5A6970,WINZIP 5A5753,SWF 5F27A889,JAR 62706C697374,PLIST 6674797033677035,MP4
import binascii import re hexFile = 'test' sigFile = 'signatures.csv' with open(hexFile, 'rb') as f:#'rb' for windows, read as binary content = f.read() hexDump = (binascii.hexlify(content)).upper() #print hexDump #create string 'hexDump' of entire file dumpLen = len(hexDump) #file length in nibbles with open(sigFile, 'r') as s: sigs = s.read() list = re.split('\n|,',sigs) #create array split elements by comma and new line signature = list[::2] #[start:stop:step] list[beginning:end:every other] type = list[1::2] #create a list of signatures and of file type progress = [10,20,30,40,50,60,70,80,90,100] for i in range(0, dumpLen): if(i>0): percent = 100*i/dumpLen if(percent%10 == 0): percent = percent+10 #account for starting at 0 for y in range(0, len(progress)): #check if percentage is included in progress list if(progress[y] == percent): #percentage found in progress list print percent,"%" progress[y] = 0 #MARK IT ZERO! Display a percentage once if(hexDump[i] != '0'): for x in range(0, len(signature)): #search for found signature in file signature list sigLen = len(signature[x]) found = hexDump[i:i+sigLen] if(hexDump[i:i+sigLen] == signature[x]): #found match print "Found possible ",type[x]," at byte offset ",i/2," with signature ",signature[x]