Daniel Bunce demonstrating how to automate IOC extraction using python scripts and an example of ISFB/Ursnif malware.
For many AV companies, Threat Intelligence companies, and Blue teams in general, automation is key. When analyzing a widespread sample for the first time, such as in the case of ISFB (also known as Ursnif), it is crucial that some form of automated IOC extraction is set up. This could range from basic information such as a Bot ID and hardcoded Command and Control (C2) server addresses, all the way up to what functions are enabled or disabled. This information can be transmitted to the malware in the initial stages of infection; however, it is most commonly stored inside the binary, typically as an encrypted or compressed blob of data – this is what is referred to as the “configuration” of the malware. Immediate and successful extraction of these IOCs can assist in blacklisting, taking down the malicious C2 servers, and even emulation in order to query and download information from the live C2 servers such as webinjects or additional modules. So, with that covered, let’s take a look at a recent sample of ISFB v2 and see how we can write a fairly simple Python script to extract the configuration.
SHA 256: 2aba7530b4cfdad5bd36d94ff32a0bd93dbf8b9599e0fb00701d58a29922c75f
I won’t be analyzing the config_parser()
function in-depth, so this post assumes knowledge of how the configuration is stored in the binary and any encryption/compression algorithms used to secure it.
The config_parser()
function is called twice in this sample, so we can guess that there are at least 2 different blobs of data containing configuration information. We can also determine what each blob of data is, by taking a look at the hash pushed as the third argument. Looking at the image below, the variable cookie
(generated during the BSS decryption function – 0x78646F86) is XOR’ed with a value before each call. The first call to config_parser()
has the cookie
XOR’ed with the value 0x3711A121, resulting in the value 0x4F75CEA7. This value corresponds to the name CRC_CLIENT32
. This indicates that the first blob of data contains an executable, which is likely the next stage. Even though no URLs or RSA keys are stored in these blobs, the config_parser()
remains the same, so once we have a working extraction script, we can extract the next stage file, and then pass that into the script to extract any additional blocks of data.
Stepping into the parsing function, we can see the cookie
being XOR’ed with 0x25CC, resulting in the value 0x5DA84A4A. The last 2 bytes of this value is then moved into ECX, overwriting the first 2 bytes with zeros. This leaves the hex value 0x4A4A, which as a string is “JJ” – this is the header string of the configuration.
The sample then begins to traverse the executable header to locate the configuration. Here, the pointer to the address of the configuration is stored inside [esi+edx+0x40]. This address points just underneath the section table, where we can see a block of data about 60 bytes in size.
This block of data is comprised of 3 different sets of data, based on the fact the JJ header appears three times. Each set is 20 bytes in length, and follows the structure seen below.
Now we know where the lookup table is located and know the structure of it, we can begin to write our extraction script. At the moment, we need three functions; one to locate the JJ structures, one to parse the structures, and one to get the blobs of data.
Writing the locate_structs() Function
This function is responsible for locating the offset of the JJ structures inside the binary, and then calculating how many structures are actually stored, returning the positioning of the start of each structure. In order to do all this, we need to import two modules; pefile
and re
. Using pefile
, we can traverse the executable header until we reach the offset of the JJ structures. From there, we can use re
to iterate through the blob of structures to determine how many structures are inside the blob, and the starting offset of each. This is then returned back to the main()
function.
With that complete, we now need to move onto parsing these structures.
Writing the parse_structs() Function
This function is fairly short and simple. First, it will split the structure blob into the individual structures, which are stored in a list. Next, the three important values (XOR key, Blob offset, Blob size) are stored inside a final list which is then returned back to the main()
function.
Now that we are able to parse the structures, let’s finally look at extracting the blobs from the executable.
Writing the extract_blob() Function
Once again, this function is fairly short and simple. First, we read the data from the designated file, and then enter a loop that takes the offset and blob size from each list value, changes the endianness of the value, and uses that to locate the blob to store in a list that will be returned to main()
.
So, now we have successfully extracted the blobs, we need to perform decompression on them. ISFB/Ursnif uses APLib to compress the configurations, so we need to use APLib to decompress them. Rather than reinvent the wheel and write our own APLib decompression script, we can utilize some open source scripts, such as this script by Sandor Nemes. In order to decompress the blobs, we can simply pass them to the decompress function like so; decompress(blob)
.
We can also implement a write_to_file()
function that uses the UUID module, allowing us to generate random names for each blob we extract and dump.
Wrapping Up…
And that is pretty much all! There are a few things to add, such as a function that will XOR the first 4 bytes of the executables with the XOR key in the structure in order to get the correct MZ header, or you can just add it manually. Once you have extracted the executables, you can simply replay the script against them to extract even more IOCs for further analysis. Obviously this only works on this particular version of ISFB, but that doesn’t stop you from going ahead and repurposing it for other variants such as version 3 which is more effective at storing the config.