Encoded strings are everywhere and have many legitimate uses across the technology sector. They are also widely used by malware authors to disguise their attacks and to implement anti-analysis techniques designed to frustrate malware hunters and reverse engineers. Understanding the encoding methods threat actors use can help not only in everyday operations but importantly in cybersecurity and network security contexts. The most common methods are not terribly hard to learn and will help you to make better decisions on the legitimacy of a command or call seen on your network. In this article, I will share both a simple and a slightly more advanced understanding of Base64
encoding. These are the methods that I use to both encode and decode in my daily work.
Basics of Base64 Encoding
A base64
string is pretty easy to identify:
VGhpcyBpcyB3aGF0IGJhc2U2NCBsb29rcyBsaWtlIGluIHRoZSB3aWxkLgo=
There are 64 characters in the Base64
“alphabet”, and an encoded string will contain a mixture of uppercase and lowercase letters, numbers, and sometimes an “=” or two (never more than two) at the end. These strings must also be divisible by 4 to be well-formed. The wiki article here goes into more details about the background of the encoding’s implementation and history, but here we’ll focus on the practical aspects within a security context.
There are a few things that I like to look for with base64
strings:
- If the string does not contain any special characters other than “=” then there is a good chance that it will be plain text when decrypted.
- If the string contains special characters like “+” or “/” then there is a good chance the string will decode into something like a compressed file or image.
A good rule of thumb for this is to decrypt the string on the command line, and if you cannot read the output then try writing it to a file and use something like Detect It Easy (D.I.E.) to determine how you can view the file contents.
Decryption is extremely easy and can be done on any OS. Let’s take a look at not only decrypting but also encrypting because, who knows? Maybe one day you will need or want to know both sides of the process.
Encoding Strings
On macOS/Linux with Bash (CLI) we can simply echo
the target string and pipe it to the base64
utility:
$: echo "Hooked on phonics worked for me" | base64
On Windows, we can encode a string with PowerShell (CLI):
> [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("Hooked on phonics worked for me"))
Both will produce the same output:
SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo=
Decoding Strings
On macOS/Linux with Bash (CLI) it’s the same process, but this time we specify the --decode
option:
$: echo "SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo=" | base64 --decode
We can achieve the same thing with a Python script like this:
#!/usr/bin/env python import base64 # Replace the quoted text with the code you wish to decrypt. coded_string = 'SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo=' # Decrypt the code string. code_dump = base64.b64decode(coded_string) # Print the decryption output to the screen. print(code_dump) # Print the decryption output a file. f = open('base64_out.txt', 'w') f.write(code_dump) f.close()
On Windows with PowerShell (CLI):
> [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String('SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo='))
We can swap out ASCII for UTF-8 if we prefer:
> [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String('SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo='))
As we did with Python above, we can replace the one-liner CLI with a PowerShell script if we wish:
# Replace the quoted text with the code you wish to decrypt.; $coded_string = "SG9va2VkIG9uIHBob25pY3Mgd29ya2VkIGZvciBtZQo=" ; # Print the decryption output to the screen. [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String($coded_string)); # Print the decryption output a file.; [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String($coded_string)) | Out-File -Encoding "ASCII" base64_out.txt;
Putting It Into Action:
So let’s see how this could help us to understand an actual attack on the network. First, we take a look at an attack sequence; the first place I always look (if the process is there) will be PowerShell:
In this case, we have an alert regarding a PowerShell command. Given that this is a Fileless attack there is no hash reputation available via third party validation tools. This means we need to review the threat details and attempt to figure out if this alert is legitimate or not.
First we can review the Attack Story information in the Raw Data section of the SentinelOne console:
Instantly, we can see it begins with PowerShell executing a base64
encoded string.
Note that this command is packed with some very common command line arguments that are very useful to know:
-noP (-NoProfile)
Does not load the PowerShell profile.-sta
Starts PowerShell using a single-threaded apartment. In Windows PowerShell 2.0, multi-threaded apartment (MTA) is the default. In Windows PowerShell 3.0, single-threaded apartment (STA) is the default.-w (-WindowStyle <Window style>)
Sets the window style for the session. Valid values are Normal, Minimized, Maximized and Hidden.-enc (-EncodedCommand <Base64EncodedCommand>)
Accepts a base-64-encoded string version of a command. Use this parameter to submit commands to PowerShell that require complex quotation marks or curly braces. The string must be formatted using UTF-16 character encoding.
Looking at the process information reveals another indicator:
Our first red alert is the vssadmin.exe delete shadows /all /quiet command. This is not an indicator of malicious intent per se, but it is extremely common with nearly all ransomware. This is confirmed by the file manipulation events:
Note the file behavior illustrates modification to the content of “Wildlife.wmv” and a change of the file extension from “wmv” to “tgrpkty”, a strong indicator of ransomware behavior.
Now let’s go ahead and review the data in Deep Visibility so that we can see other IOCs that can aid us in prevention:
Here we see the long, encoded base64
string. It would be nice to know what it’s doing! Let’s extract the entire base64
code block. Using the information we learned earlier we can now decode the attack and gain a better idea of what this command is trying to do.
Here’s the encoded string:
Here’s what it looks like after being decoded with one of the methods we explained above:
We can now see the PowerShell in plain text, but let’s clean it up and “prettify” it. We can do that in Sublime Text with the help of a plugin. Here’s the decoded PowerShell now made much easier to read:
Now we can see that the command is reaching out to emp[.]fourhorsemen[.]tech
over port ‘8080′ for the /login/process.php
file.
While this is a very simplistic use case it is a great example of what kind of counter intelligence can be obtained with 5 mins of extra work. Blocking the FQDN we extracted can not only increase infrastructure safety but also reduce the alerts that your IT or security team will need to address and save you time for other tasks in the future.
Over Engineer All the Things
So if this is so easy to decode then why use it to obfuscate malicious code? Great question!
The answer is because Base64
is not only OS agnostic but, as it turns out, very robust and relatively easy to over engineer. Sooner or later you will run into something that fails to decode:
This is where you start doing things like dumping the output to a file and checking to see if it’s Windows shellcode, but this can also happen when the author uses a custom encoding key. Conceptually, that’s not hard to do, but it requires the attacker to make the decryption key available to the system, and that means defenders always have the opportunity to reverse it if they can catch it in action.
Let’s take a look. In Python, a simple way to create a base64
string with custom key is to use the translation module:
import string import base64 default_key = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' custom_key = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/' encode_translation = string.maketrans(default_key, custom_key) decode_translation = string.maketrans(custom_key, default_key) def encode(input): return base64.b64encode(input).translate(encode_translation) def decode(input): return base64.b64decode(input.translate(decode_translation)) string = 'malicious commands' encoded_string = encode(string) print('Translation Key: '+custom_key) print('Plain Text String: '+string) print('Base64 Encoded Plain Text String: '+base64.b64encode(string)) print('Translated String: '+string.translate(encode_translation)) print('Base64 Encoded Translated String: '+encoded_string) print('-----------------------------------------------------------') print('Default Key: '+default_key) print('Default Key Decoded Translated Base64 String: '+encoded_string.translate(encode_translation)) print('Custom Key Decoded Translated Base64 String: '+encoded_string.translate(decode_translation)) print('Default Key Decoded Translated String: '+base64.b64decode(encoded_string)) print('Custom Key Decoded Translated String: '+decode(encoded_string))
This script creates an output like so:
Take note of the keys while we walk through this behavior
Default key
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789Custom key
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
The translation is actually very simplistic:
Using the above table you can see that when the script is translating strings:
- ‘A’ becomes ‘0’
- ‘B’ becomes ‘1’
and so on.
Now apply that logic to the string we are encoding with the script:
Plain Text String: malicious
Translated String: CqBysyEKI
- m = C
- a = q
- l = B
- i = y
- c = s
- i = y
- o = E
- u = K
- s = I
So this means that while the plain text of:
malicious commands
would encode to the following base64
string:
bWFsaWNpb3VzIGNvbW1hbmRz
we now have a plain text string of:
CqBysyEKI sECCqDtI
which encodes to:
rm5IqmdFrTlP86dLrmRxrChP
Obviously this process is very different in PowerShell but still achievable:
Function encode_string { foreach ($c in $args[0].ToCharArray()) { if ($default_key.Contains($c)) { $encode_string = $encode_string + $custom_key[$default_key.indexof($c)] } else { $encode_string = $encode_string + $c } } return $encode_string } Function decode_string { foreach ($c in $args[0].ToCharArray()) { if ($default_key.Contains($c)) { $decode_string = $decode_string + $default_key[$custom_key.indexof($c)] } else { $decode_string = $decode_string + $c } } return $decode_string } $default_key = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" $custom_key = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" $string = "malicious commands" $encoded_string = encode_string $string $decoded_string = decode_string $encoded_string $b64_default_encoded = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("$string")) $b64_custom_encoded = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("$encoded_string")) Write-Host "Translation Key:" $custom_key Write-Host "Plain Text String:" $string Write-Host "Base64 Encoded Plain Text String:" $b64_default_encoded Write-Host "Translated String:" $encoded_string Write-Host "Base64 Encoded Translated String:" $b64_custom_encoded Write-Host "-----------------------------------------------------------" Write-Host "Default Key:" $default_key Write-Host "Default Key Decoded Translated String:" $([System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("$b64_custom_encoded"))) Write-Host "Custom Key Decoded Translated String:" $decoded_string
This gives the following output:
Conclusion
Encoding, encrypting, and obfuscating are becoming more and more commonplace in our age of technology. Knowing the basics is key to understanding not just how to identify threat actors and malicious files but also how to keep our end user data safe as well. It is not only important to understand how to read and reverse these strings but also to have security software on our network that can provide the visibility we need to see these bits of code so that we can attempt to identify new threats.