I'd like to think that security awareness has gotten to the point where the average end user thinks twice before opening an 'exe' file sent to them as an email attachment. I like to think that. I really do. But when it comes to opening PDF documents, whether it be an email attachment or their latest online utility bill, I can't even begin to convince myself that there is ever a moment of hesitation. And am I the only one who finds it ironic that security publications covering recent PDF attacks can often be downloaded in PDF form? How long would it take to live down being compromised by a document that is warning users about itself? Maybe it's just my affinity for self-referential humor.
A point to note is that the Portable Document Format is already a huge winner for presenting content for a number of reasons; the proliferation of easily accessible cross-platform readers, and relatively small file sizes are two quick obvious ones. Since many attackers tend to be opportunistic, PDF's popularity among end users and it's ability for dynamic action makes it a natural choice as an attack vector. Attackers go where the victims are, so to speak. Often times it comes down to a simple numbers game. More users on a particular platform equals more potential victims to the attacker.
If you pay attention to the news you don't have to think back too far to remember various incidents involving malformed PDF documents. From hackers who leverage malicious PDF documents to gain a foot hold on an internal network of a major corporation, to reversers taking advantage of a weakness in a rendering engine to jailbreak their smart phones, PDF's are being used to bypass established security protections. How do we defend ourselves against maliciously crafted PDF's? There are a variety of methods that can be employed, but I think the best first move for those with the technical inclination is to understand the problem at hand by looking at a sample.
In this first of a multi-part writeup we will analyze a sample PDF aptly named sample1.pdf, and attempt to determine if the file is malicious or not. We will analyze it using a blend of both static and dynamic methodologies. If we determine that the file is malicious (spoiler alert: it is) we will dissect the attacks that were employed. We will trace the code of the document through various rounds of obfuscation, rout out common techniques employed by the attackers, and identify the vulnerabilities that were targeted.
First things first. It may seem like it goes without saying, but in your zeal to dig into the tech, you may forget to check if someone has already encountered your file and done the heavy lifting for you. Before you even try opening the file, run a quick MD5 sum and do an online search to see if you get any hits. If you know that there aren't any confidentiality issues regarding your file, you may also want to submit to any of the myriad of online services. A fairly comprehensive list of online services from anti-virus scanners to automated sandboxes can be found over at cleanbytes.net. You may find that your answers are already well documented or easily detected through automated analysis. If this is your case, "Bob's your uncle" as they say. However if you are not able to get a clear answer one way or the other through your searching, or if your particular file has the potential to contain sensitive information you may need to take the next step in analysis.
What would you say "ya do" here?
Since you are investigating the nature of your file, you will want to use a few tools to peek inside the file without dynamically executing the contents. There are a growing number of tools to choose from when analyzing PDF's. I will demonstrate a sampling of them throughout the post. To begin with, a simple strings dump will give us any printable characters in the file:
Running a second tool, PDFiD from Didier Stevens confirms what we are seeing in the previous strings output by displaying the structure of the objects and actions:
A third and final tool we can use to take a static look at our file is pdfscan.rb from Origami, a Ruby framework used to analyze PDF documents:
Get into the light where you belong.
We see from the output that Object 3 contains three additional tags:
1) Producer = substr
2) Subject = spli
3) Title = [data 45194 bytes]
Untangling this code isn't a required step, but it gives you a more complete view into what is going on under the hood, and can help prevent missing a branch of conditional code that might be hiding some unknown functionality.
We can see that the large amount of data stored in the title variable is being decoded and evaluated. The next step is to execute our code in a controlled manner to see what the code is doing with that data. There are a couple of ways to accomplish this. If your preference is for command line tools, SpiderMonkey is the way to go. Like many of these great analysis tools it comes pre-compiled on Lenny Zeltser's REMnux 2 linux distro. If you prefer a GUI interface for this stage, Malzilla or PDFStreamDumper are both nice visual solutions. We are going to mix it up a bit and check out one of the GUIs.
1) function build_nop()
2) function collabExploit()
3) function printf()
4) function geticon()
5) function a()
The first function creates a NOP sled. The remaining functions exploit known vulnerabilities with PDF viewing software:
1) NOP sled