Where were we?
As the title states, this is the second part of Analyzing PDF Malware. If you haven't read the first part you can find it here. Go ahead and read it now if you haven't already, we'll wait.
Our suspicions have been confirmed. We know that our sample contains malicious code. So what is the next step? In some cases, just knowing that the file is "bad" can be the endgame of the analysis. However, in other situations we may want to dig even deeper. What if one or more of the exploits we identified was successful in executing on the target system? How was the system affected? Does it cripple the device, install additional software, or steal passwords? The more succinct question is: "What is the payload of the attack?"
To find out the answer to that question we will use a blend of both static and dynamic methodologies to dig into the sample and analyze exactly what is happening under the hood of this PDF malware, step-by-step.
The Game Plan:
- Identify and isolate shellcode within payload
- Analyze shellcode to determine its capabilities
Combing the Desert
So let's jump right in. Where should we start looking for our payload? Hmmm, I don't know about you, but this line sure seems to stand out to me and may be worth some further investigation:
jsdecode_stage2.js:16: var payload = unescape(bjsg);jsdecode_stage2.js:37: var payload = unescape(bjsg);jsdecode_stage2.js:61: var payload = unescape(bjsg);jsdecode_stage2.js:104: var e = unescape(bjsg);
This makes sense as each exploit would need a payload, and it appears that all of them are using the same one. The 'bjsg' variable being assigned to 'payload' is a long string (1062-bytes total) of four hexadecimal characters delimited by '%u'.
This is a common way to encode binary data into a string (USC2/UTF-16) and it immediately seems like a good shellcode candidate. Before we get too excited, let's look a little closer and make sure our assumption makes sense. The first few bytes are 0x9090, which is the Intel x86 opcode for NOP (No Operation Performed). It is common for shellcode to be preceded by NOPs. This provides the attack code a little padding so that the exploit only needs to get close to the memory location of the shellcode for it to be sucessful. This is commonly known as a NOP sled. So far, so good. The next non-NOP instruction we see is 0x16EB which if represented as little-endian machine code it would be the equivalent of a relative short jump (JMP) call. Again this would make sense in the context of shellcode. Since all of the puzzle pieces seem to be lining up just right, we will go ahead and rename 'bjsg' to 'shellcode':
Now that we have our shellcode payload, let's begin to dissect it to find out how it does what it does. The first thing we need to do is convert it from its current printable form into something more suitable for analysis.
Selecting the "UCS2 to Hex" button will give us our little-endian hexadecimal byte representation of the shellcode.
Selecting "Hex to File" will save off our file to disk as a much more useable '.bin' file.
Plumbing Rabbit Holes
Now we come to a fork in the analysis and need to choose what we want to do next from a list of options. We can take our newly created '.bin' and disassemble it to study it statically, we can execute our code dynamically in an emulator, or wrap it in an executable shell to debug the code step-by-step in a debugger. My vote is to start with the easiest of the bunch, emulated execution, and see if we can get a quick win. I'm guessing you agree.
Libemu is a C based x86 emulator used for the detection of shellcode among other things. It's a great tool, but it's not fail-proof. Included with the libemu test suite is a tool called sctest. Its authors describe the sctest tool as the following:
"…a useful source for those who want to setup shellcode emulation allowing win32 api calls and offering hooks on these calls."
With a description like that, this tool is definitely made-to-order for our task. If you don't want to bother with installing it yourself, sctest comes pre-compiled along with a collection of other great tools on Lenny Zeltser's REMnux linux distro. A brand new version, REMnux 3, was just made available for download in mid-December 2011.
Running sctest to emulate our hexfile.bin we can attempt to hook the Win32 API calls being made by the malware in an effort to see if they reveal any juicy bits of info:
From a snippet of the very verbose sctest output shown above, we discover that LoadLibrary is being used to first load the urlmon library. We also discover that the URLDownloadToCacheFile function of urlmon.dll is specifically being called. According to Microsoft's MSDN documentation, this function "downloads data to the Internet cache and returns the file name of the cache location for retrieving the bits". This is a huge clue as to what the malware is attempting to do on the victim system. It doesn't answer all of our questions but it gives us some good indications for now. Specifically, it appears that we are dealing with the "download and execute" class of shellcode.
So a file is being downloaded, but where is the file coming from? Where does the malware call home? Maybe we will get lucky and find an IP address or URL that the shellcode is attempting to contact by running strings against the file:
No dice. I guess we're not going to get that lucky with this sample. No meaningful clear text strings in the file at all. It was worth a shot. Before moving on, there is one other bit of sctest functionality worth mentioning. Issuing commands like the following will create a nice instruction call graph of our shellcode:
$ sctest -Ss 1000000000 -G shellcode_graph.dot < hexfile.bin
$ dot -T png -o shellcode_graph.png shellcode_graph.dot
*click to enlarge graph
Who doesn't like a nice graph? This is more than just nerd art though; we can use it to quickly see conditional branching and looping structures before looking at a single assembly code operation. For example, if we zoom in and look at the first loop at memory location 0x00417015 we see something interesting:
Ok, maybe it's not that interesting yet, but the graph gives us a place to start investigating. To get to the interesting stuff we need to see what is actually going on in the code at those locations, and to do that we need to get a disassembly of our shellcode. There are many ways to do this, but I recently came across a standalone tool for Windows written by Alain Rioux called ConvertShellcode.exe. You can paste in escaped shellcode on the command line as an argument and get your dead listing of the assembly code.
Now, if we investigate those four instructions from our call graph that we were interested in, we can finally see what is going on in that loop.
The loop is reading a byte of shellcode, applying an XOR mask, writing the new result back, and repeating. This is what's known as a staged XOR loader. The first stage of the shellcode is just enough to load this loop, leaving the remaining shellcode obfuscated. The second stage is then loaded at run time after the code modifies itself by applying the XOR key (0x19). This is one of the simple ways that malware authors use to complicate the analysis of static code.
Armed with this newfound information, let's confirm what we know by running Didier Steven's XORSearch tool against the shellcode and see if that uncovers anything. XORSearch will brute force XOR, ROL, and ROT encodings looking for the string or a file of strings you provide. Case does matter with this type of searching, so keep that in mind or you may get confusing results.
We finally get a hit on the search for "http". This gives us confirmation that the XOR key being used to encode the shellcode is 0x19 and it also gives us an interesting URL: hxxp://shoppingmaddness.com/d.php?f=24&e=5. This is likely the answer to the question about where the malware is attempting to download a file.
Let's Sum Up
In the next part of this series we will decode the second stage shellcode and load it into our favorite disassembler. We will trace through the functions and fully map the code. We will also investigate the URL we discovered and see what sort of things that might get us tangled up with.
- ConvertShellcode.exe - Converts shellcode strings into x86 assembly instructions
- Malzilla - Malware hunting tool
- XORSearch - Tool to search in an XOR, ROL or ROT encoded binary file
- Libemu - x86 Shellcode Emulation
- Graphviz - Open Source graph visualization software
- REMnux -A Linux Distribution for Reverse-Engineering Malware