Welcome back to our series on macOS reversing. Last time out, we took a look at challenges around string decryption, following on from our earlier posts about beating malware anti-analysis techniques and rapid triage of Mac malware with radare2. In this fourth post in the series, we tackle several related challenges that every malware hunter faces: you have a sample, you know it’s malicious, but
- How do you determine if it’s a variant of other known malware?
- If it is unknown, how do you hunt for other samples like it?
- How do you write robust detection rules that survive malware author’s refactoring and recompilation?
The answer to those challenges is part Art and part Science: a mixture of practice, intuition and occasionally luck(!) blended with a solid understanding of the tools at your disposal. In this post, we’ll get into the tools and techniques, offer you tips to guide your practice, and encourage you to gain experience (which, in turn, will help you make your own luck) through a series of related examples.
As always, you’re going to need a few things to follow along, with the second and third items in this list installed in the first.
- An isolated VM – see instructions here for how to get set up
- Some samples – see Samples Used below
- Latest version of r2 – see the github repo here.
What are Zignatures and Why Are They Useful?
By now you might have wondered more than once if this post just had a really obvious typo: Zignatures, not signatures? No, you read that right the first time! Zignatures are r2’s own format for creating and matching function signatures. We can use them to see if a sample contains a function or functions that are similar to other functions we found in other malware. Similarly, Zignatures can help analysts identify commonly re-used library code, encryption algorithms and deobfuscation routines, saving us lots of reversing time down the road (for readers familiar with IDA Pro or Ghidra, think F.L.I.R.T or Function ID).
What’s particularly nice about Zignatures is that you can not only search for exact matches but also for matches with a certain similarity score. This allows us to find functions that have been modified from one instantiation to the other but which are otherwise the same.
Zignatures can help us to answer the question of whether an unknown sample is a variant of a known one. Once you are familiar with Zignatures, they can also help you write good detection rules, since they will allow you to see what is constant in a family of malware and what is variant. Combined with YARA rules, which we’ll take a look at later in this post, you can create effective hunting rules for malware repositories like VirusTotal to find variants or use them to help inform the detection logic in malware hunting software.
Create and Use A Zignature
Let’s jump into some malware and create our first Zignature. Here’s a recent sample of WizardUpdate (you might remember we looked at an older sample of WizardUpdate in our post on string decryption).
We’ve loaded the sample into r2 and run some analysis on it. We’ve been conveniently dropped at the main()
function, which looks like this.
That main
function contains some malware specific strings, so should make a nice target for a Zignature. To do so, we use the zaf
command, supplying the parameters of the function name and the signature name. Our sample file happened to be called “WizardUpdateB1”, so we’ll call this signature “WizardUpdateB1_main”. In r2, the full command we need, then, is:
> zaf main WizardUpdate_main
We can look at the newly-created Zignature in JSON format with zj~{}
(if you’re not sure why we’re using the tilde, review the earlier post on grepping in r2).
To see that the Zignature works, try zb
and note the output:
The first entry in the row is the most important, as that gives us the overall (i.e., average) match (between 0.00000 and 1.00000). The next two show us the match for bytes and graph, respectively. In this case, it’s a perfect match to the function, which is of course what we would expect as this is the sample from which we created the rule.
You can also create Zignatures for every function in the binary in one go with zg
.
Beware of using zg
on large files with thousands of functions though, as you might get a lot of errors or junk output. For small-ish binaries with up to a couple of hundred functions it’s probably fine, but for anything larger than that I typically go for a targeted approach.
So far, we have created and tested a Zignature, but it’s real value lies in when we use the Zignature on other samples.
Create A Reusable and Extensible Zignatures File
At the moment, your Zignatures aren’t much use because we haven’t learned yet how to save and load Zignatures between samples. We’ll do that now.
We can save our generated Zignatures with zos <filename>
. Note that if you just provide the bare filename it’ll save in the current working directory. If you give an absolute path to an existing file, r2 will nicely merge the Zignatures you’re saving with any existing ones in that file.
Radare2 does have a default address from which it is supposed to autoload Zignatures if the autoload variable is set, namely ~/.local/share/radare2/zigns/
(in some documentation, it’s ~/.config/radare2/zigns/
) However, I’ve never quite been able to get autoload to work from either address, but if you want to try it, create the above location and in your radare2 config file (~/.radare2rc
) add the following line.
e zign.autoload = true
In my case, I load my zigs file manually, which is a simple command: zo <filename>
to load, and zb
to run the Zignatures contained in the file against the function at the current address.
As you can see, the sample above B5 is a perfect match to B1, whereas B2 is way off with the match only around 46.6%.
When you’ve built up a collection of Zignatures, they can be really useful for checking a new sample against known families. I encourage you to create Zignatures for all your samples as they will pay dividends down the line. Don’t forget to back them up too. I learned the hard way that not having a master copy of my Zigs outside of my VMs can cause a few tears!
Creating YARA Rules Within radare2
Zignatures will help you in your efforts to determine if some new malware belongs to a family you’ve come across before, but that’s only half the battle when we come across a new sample. We also want to hunt – and detect – files that are like it. For that, YARA is our friend, and r2 handily integrates the creation of YARA strings to make this easy.
In this next example, we can see that a different WizardUpdate sample doesn’t match our earlier Zignature.
While we certainly want to add a function signature for this sample’s main()
to our existing Zigs, we also want to hunt for this on external repos like VirusTotal and elsewhere where YARA can be used.
Our main friend here is the pcy
command. Since we’ve already been dropped at main()
’s address, we can just run the pcy
command directly to create a YARA string for the function.
However, this is far too specific to be useful. Fortunately, the pcy
command can be tailored to give us however many bytes we wish at whatever address.
We know that WizardUpdate makes plenty of use of ioreg
, so let’s start by searching for instances of that in the binary.
Lots of hits. Let’s take a closer look at the hex of the first one.
That URL address might be a good candidate to include in a YARA rule, let’s try it. To grab it as YARA code, we just seek to the address and state how many bytes we want.
This works nicely and we can just copy and paste the code into VT’s search with the content modifier. Our first effort, though, only gives us 1 hit on VirusTotal, although at least it’s different from our initial sample (we’ll add that to our collection, thanks!).
But note how we can iterate on this process, easily generating YARA strings that we can use both for inclusion and exclusion in our YARA rules.
This string gives us lots of hits, so let’s create a file and add the string.
pcy 32 >> WizardUpdate_B.yara
From here on in, we can continue to append further strings that we might want to include or exclude in our final YARA rule. When we are finished, all we have to do is open our new .yara
file and add the YARA meta data and conditional logic, or we can paste the contents of our file into VTs Livehunt template and test out our rule there.
Xrefs For the Win
At the beginning of this post I said that the answer to some of the challenges we would deal with today were “part Art and part Science”. We’ve done plenty of “the Science”, so I want to round out the post by talking a little about “the Art”. Let’s return to a topic we covered briefly earlier in this series – finding cross-references in r2 – and introduce a couple of handy tips that can make development of hunting rules a little easier.
When developing a hunting or detection rule for a malware family, we are trying to balance two opposing demands: we want our rule to be specific enough not to create false positives, but wide or general enough not to miss true positives. If we had perfect knowledge of all samples that ever had been or ever would be created for the family under consideration, that would be no problem at all, but that’s precisely the knowledge-gap that our rule is aiming to fill.
A common tip for writing YARA rules is to use something like a combination of strings, method names and imports to try to achieve this balance. That’s good advice, but sometimes malware is packed to have virtually none of these, or not enough to make them easily distinguishable. On top of that, malware authors can and do easily refactor such artifacts and that can make your rules date very quickly.
A supplementary approach that I often use is to focus on code logic that is less easy for author’s to change and more likely to be re-used.
Let’s take a look at this sample of Adload written in Go. It’s a variant of a much more prolific version, also written in Google’s Golang. Both versions contain calls to a legit project found on Github, but this variant is missing one of the distinctive strings that made its more widespread cousin fairly easy to hunt.
However, notice the URL at 0x7226
. That could be interesting, but if we hit on that domain name string alone in VirusTotal we only see 3 hits, so that’s way too tight for our rule.
We might do better if we try grabbing bytes of code right after that string has been loaded, for while the API string will certainly change, the code that consumes it perhaps might not. In this case, searching on 96 bytes from 0x7255
catches a more respectable 23 hits, but that still seems too low for a malware variant that has been circulating for many months.
Let’s see if we can do better. One trick I find useful with r2 is to hunt down all the XREFs to a particular piece of code and then look at the calling functions for useful sequences of byte code to hunt on.
For example, you can use sf.
to seek to the beginning of a function from a given address (assuming it’s part of a function, of course) and then use axg
to get the path of execution to that function all the way from main()
. You can use pds
to give you a summary of the calls in any function along the way, which means combining axg
and pds
is a very good way to quickly move around a binary in r2 to find things of interest.
Now that we can see the call graph to the C2 string, we can start hunting for logic that is more likely to be re-used across samples. In this case, let’s hunt for bytes where sym.main.main
calls the function that loads the C2 URL at 0x01247a41
.
Grabbing 48 bytes from that address and hunting for it on VT gives us a much more respectable 45 TP hits. We can also see from VT that these files all have a common size, 5.33MB, which we can use as a further pivot for hunting.
We’ve made a huge improvement on our initial hits of 3 and then 23, but we’re not really done yet. If we keep iterating on this process, looking for reusable code rather than just specific strings, imports or method names, we’re likely to do much better, and by now you should have a solid understanding of how to do that using r2 to help you in your quest. All you need now, just like any good piece of malware, is a bit of persistence!
Conclusion
In this post, we’ve taken a look at some of r2’s lesser known features that are extremely useful for hunting malware families, both in terms of associating new samples to known families and in searching for unknown relations to a sample or samples we already have. If you haven’t checked out the previous posts in this series, have a look at Part 1, Part 2 and Part 3. The series continues with Part 5 here.
Samples Used
File name | SHA1 |
WizardUpdate_B1 | 2f70787faafef2efb3cafca1c309c02c02a5969b |
WizardUpdate_B2 | dfff3527b68b1c069ff956201ceb544d71c032b2 |
WizardUpdate_B3 | 814b320b49c4a2386809b0bdb6ea3712673ff32b |
WizardUpdate_B4 | 6ca80bbf11ca33c55e12feb5a09f6d2417efafd5 |
WizardUpdate_B5 | 92b9bba886056bc6a8c3df9c0f6c687f5a774247 |
WizardUpdate_B6 | 21991b7b2d71ac731dd8a3e3f0dbd8c8b35f162c |
WizardUpdate_B7 | 6e131dca4aa33a87e9274914dd605baa4f1fc69a |
WizardUpdate_B8 | dac9aa343a327228302be6741108b5279adcef17 |
Adload | 279d5563f278f5aea54e84aa50ca355f54aac743 |