SpiderLabs Blog

Analyzing PDF Malware - Part 3A

June 08, 2012 5 minutes read Ryan Merritt

When we last left our heroes…

This is the third part of the Analyzing PDF Malware series. If you haven't read the first and second parts you can find them here and here respectively. We will be building off our analysis from those initial posts.

I will go into detail about my system when possible, so if you are following along at home your mileage may vary depending on your own particular setup. Also note that all images inline are clickable and will display a higher resolution picture in separate window.

…

In Part 1 we decoded multi-stage JavaScript that we extracted out of our appropriately named suspect file: 'sample1.pdf'. We further determined that there were four different exploits contained within the code. In Part 2 we tracked down the payload used by the exploits. We decoded the shellcode, wrote it to a binary formatted file, emulated its execution, and discovered a pair of important Win32 API calls. Lastly, we performed an initial disassembly of the binary, found an XOR key being used to obfuscate the main part of the shellcode, and used that key to begin to decode the remaining portion of the shellcode.

We have already answered many of our questions regarding this sample. We know that the PDF is malicious. We know that after a successful exploitation it is downloading a file from a specific URL. So what is left to figure out? If we want to understand our sample completely, we need to take one final delve into the code. What kinds of tricks will we find in the second stage shellcode hidden behind the XOR? Are there conditional situations that might cause the code to behave uniquely between executions? What is the endgame of this malware?

Make no mistake, to find the answers to our final questions we will once again blend both static and dynamic methodologies. We will dig into the lowest levels of our systems to determine exactly what is happening under the hood of this PDF malware.

Remaining Goals:

Disassemble the second stage shellcode (This Post's Goal)
Analyze the disassembly to determine its full capabilities
Track down and determine the ultimate goal of the malware

Gentlemen, we can rebuild him. We have the technology…

To begin, we can load up the same 'hexfile.bin' that we exported from Malzilla back in Part 2 into our favorite disassembler. I will be using Hex-Rays IDA in the following demonstrations. Hex-Rays is nice enough to make a feature restricted demo of the current version available as well as a fully functional, but older version available as a Freeware download.

The good news is that the shellcode isn't very big, only taking up 354 bytes (0x161) counting the four NOP instructions padding the beginning:

8685_34e239e5-6535-4c7b-8b7b-e8e53f40d85e ^{Fig1. – 010 Editor Hex-View of Shellcode}

Authors of malicious code have several motivators to write their shellcode as compact as possible. For once, this actually works to our advantage. But don't be fooled, just because there isn't a ton of code doesn't mean there won't be tricks employed to attempt to trip up our analysis!

Since this is simply a binary file and not a normal Portable Executable (PE) formatted file, IDA doesn't know where to start auto-analyzing the code when we first open 'hexfile.bin'.

10770_988ed750-7ef5-47fe-85f8-094f51729bb6 ^{Fig 2. – IDA asking for an entry point into the code}

Our initial IDA view after loading up the code is just a single long data segment dead-listing out one-byte hex values.

8139_199aaea3-f654-4e8e-b1ae-31c0c8811932 ^{Fig3 – Initial IDA View}

If we select the first byte in the file (0x90 - NOP) and press 'C' (for code) as prompted by IDA, we kick off the auto-analysis from the very beginning. IDA is able to successfully identify a single function, which shows up in the function name panel. By double clicking the function name and hitting the space bar (for graph view) we will see that the function is now represented as three nodes connected by arrows (edges). I have named the nodes and added comments to each line of assembly to describe its functionality.

12385_e813f31b-9efe-4d4b-b62b-2ab56c5aeac2
^{Fig4. – IDA Graph View of the "xor_loader" subroutine}

This should look familiar, especially the middle node. It's the same code we highlighted towards the end of Part 2 when we discovered the "XOR loader loop". Makes sense - we're just looking at the same bit of code in a different tool now. To refresh, the loop shown above in Fig2 is reading a byte of its own instructions, applying an XOR (eXclusive OR) logical bitwise operation, writing the new result back, and then moving on to the next byte. This is what's known as a staged XOR loader. The first stage of the shellcode is just enough to load this loop, leaving the remaining shellcode obfuscated. The second stage is then deobfuscated and executed after the code has modifies itself. This is a simple but effective trick against someone who is statically analyzing the code as we are since it prevents us from observing the actual instructions that will be executed in that second stage. It also helps to hide the malicious code from signature based malware detection such as IDS/IPS and antivirus engines. Ironically, antivirus engines will sometimes use a similar XOR technique to quarantine a detected piece of malware to render it inert, but recoverable.

There are a couple of ways we can go about taking our next step. One way would be to compile the shellcode within a normal PE that does nothing more than execute the code, and then open that within a debugger to step through the code. Another way would be to apply the XOR operation ourselves to mimic what the loader would do dynamically. I will leave the first tactic as a challenge to the reader and we will circle back to it. There are helpful resources floating around out there both as a web service, or if your code is potentially sensitive, as a stand-alone script. Instead let's investigate a few ways we could go about decoding the sample statically first.

First we'll use the Windows GUI tool Malzilla, since this is the same tool we used to perform other parts of our analysis. If we load 'hexfile.bin' into the 'Hex view' tab we can take advantage of the handy "Find XOR Key" functionality. If you provide a list of strings to find (case sensitive) and a max key size to check, this tool will brute force the hex key space looking for the first occurrence of a provided string and return the corresponding key. We actually already know what our key is (0x19), but I wanted to highlight this nice feature anyway.

9644_64d15f1d-fea6-4caa-bf2b-87fdfad402bd ^{Fig5. – Malzilla finding the XOR key through brute force}

We can then click the "Apply XOR" button to apply the XOR key against the whole file. That's pretty handy, but it does XOR the WHOLE file, which will overwrite the loader code located at the beginning. Not ideal. If you look closely at the boxed ASCII in Fig6 you can see a URL string located at the end of the file, which confirms it decoded successfully.

8482_2b4e2df4-2401-487c-9175-e6b9eb6e2f53 ^{Fig6. – Malzilla after applying the XOR key to the shellcode}

If you already know your XOR key, a quick way to do the same thing at a command line is to install the Ruby BlackBag (rbkb) and run 'xor'. Beyond just the simple XOR utility, this collection of command line tools and Ruby libraries written by Eric Monti (@ericmonti) are ridiculously useful in many situations.

11102_a8f480ef-64bb-43b2-9e4a-45478d71b8df ^{Fig7. – RBKB command line xor utility}

But this solution still XOR's the entire file. If we want a clean listing for our static analysis we need to go back to IDA. IDA extends it's own functionality through IDC scripts and through plugins that allow other common scripting languages such as Ruby (IDARub) and Python (IDAPython). Particularly of use to us right now is the IDC script written by Eloy Paris aptly named "XOR Script". If we first highlight the bytes from seg000:00000021 through the end of the file seg000:00000161 and then execute the "XOR Script" it will present a prompt for the XOR key. Note that it expects a decimal number instead of hex; so don't forget to do the conversion (0x19 = 25).

9187_4dd4e14c-aecb-44a3-bc50-c35c51a271d4 ^{Fig8. – XOR Script decoding a selection of bytes within IDA.}

Now that we've applied the key we need to have IDA reanalyze the database. To do that, right click on the lower left corner of the main window and select "Reanalyze Program". Our Functions window should now update and show us that we have four functions instead of one.

9059_48774ec8-514b-46c9-a1fe-f685ccece318

^{Fig9. – IDA Functions window showing newly identified functions}

Phew, break time…

This is a "good stopping point" for now. We have a nicely reconstructed listing of our shellcode as if it had just finished decoding the second stage. This will allow us to use our interactive disassembler to statically analyze the newly decoded instructions when we continue with the next post in the Analyzing PDF Malware series. Until that time, feel free to work ahead from where we left off and see what you can discover on your own!

--@Rnast

Tools Used:

Malzilla - Malware hunting tool
IDA - Multi-processor disassembler and debugger
rbkb – Ruby BlackBag A miscellaneous collection of command-line tools and ruby library helpers related to pen-testing and reversing.
010 Editor - Professional text and hex editing with Binary Templates technology.
XOR Script - XOR Script written in the IDC.

Resources:

"xord.idb" – Commented IDA database file.

Experiencing a security breach?

Experiencing a security breach?

Analyzing PDF Malware - Part 3A

Latest SpiderLabs Blogs

Cloudy with a Chance of Hackers: Protecting Critical Cloud Workloads

Trustwave Rapid Response: CrowdStrike Falcon Outage Update

Using AWS Secrets Manager and Lambda Function to Store, Rotate and Secure Keys

Stay Informed

Sign up to receive the latest security news and trends straight to your inbox from Trustwave.