SpiderLabs Blog

Defeating Flame String Obfuscation with IDAPython

Written by Josh Grunzweig | Jun 1, 2012 11:41:00 AM

Like many other security research firms, SpiderLabs Research has been actively investigating the Flame (a.k.a. sKyWIper) malware that was revealed earlier this week. For those unaware of what Flame is, I'll provide a very brief summary. Essentially, Flame is a modular, extremely large, piece of malware that was discovered in Iran. The malware was found to be quite complex, providing the attackers with a wealth of information and control over the infected system(s). Many of the components included in Flame provide some form of encryption and/or obfuscation, which is what I will be discussing today. Specifically, I'm going to talk about String obfuscation encountered in the advnetcfg.ocx component, and how I was able to defeat it using IDAPython.

Before I jump straight into IDAPython, however, I'd like to take a step back and describe how I was able to identify the String obfuscation, and ultimately discover the plain text of the Strings in the sample.

Upon analyzing advnetcfg.ocx, I discovered a number of pieces of data that were being supplied as arguments to the same function.

In the image below, we can see how these blocks of data are being supplied to the function:

Further review revealed that, in total, 179 references were being made to this function. The large number of references, coupled with the data being supplied to the function, provided a strong clue that some form of de-obfuscation or decryption routine might be taking place. If we jump into sub_1000BE16, we start to get a better understanding of what is happening.

Ok, so maybe the screenshot above doesn't provide the best representation of what is going on. Below, I've essentially taken the above function and converted it to Ruby.

The function above is taking the obfuscated String as a parameter, and checking the sixteenth byte to determine if it is null. This byte is acting as a Boolean value to tell the function if the String has already been decoded. In the event that this byte is not set to null, or 0x00, another function is called, and the sixteenth byte is set to 0x00. Finally, the result of String that was initially supplied as a variable, with an offset of +20, is returned. If I were a betting man, I'd suspect that the second function (named 'deobfuscate() in the above Ruby code) is manipulating the data somehow. In order to find out, let's investigate what is going on.

If we look above, we can see that this new function is supplied two arguments—The 'obfuscated_string' variable with an offset of +20, as well as the eighteenth byte in 'obfuscated_string'. We can see how these arguments are used below:

Don't worry about reading the above Assembly, as I have it converted to Ruby below. I'm simply providing the Assembly as a reference to those that are interested.

OK, so let's see what is happening. As we can see above, the first argument (obfuscated_string+20) is actually the true obfuscated string. The second argument of obfuscated_string[18] is the size of the obfuscated_string+20—essentially telling the function when to stop. I've tried to demonstrate this below, in the event people are having trouble following:

So this function appears to call a third function (last one I promise), and proceeds to subtract the resulting number from the specific character in the string before replacing it. So if we were looking at the first byte (0xA7), and the third function returned 0x82, we would get the following:

0xA7 – 0x82 = 0x25 ("%")

So that brings us to the third, and last, function in this de-obfuscation routine. This function, unlike the previous two, is quite small:


The above function decodes to the following in Ruby:

The argument the above function takes is an incrementing position in the obfuscated String. So the first time it is called, 'index' is equal to 0, then 1, 2, 3, etc. until it reaches the end of the obfuscated string.

At this point we've concluded recreating the de-obfuscation routine that is used in advnetcfg.ocx. So you're probably sitting on your comfy lay-z-boy asking yourself, "Wait a minute, I thought you said you were going to defeat it with IDAPython!? What's all this Ruby nonsense?" Well, it's true; I made a slight blunder when I originally dove into this code.

I originally wrote everything in Ruby, only to realize that writing it in Python might have been a better option (you'll see why in a minute). No, it's not because Python is a superior scripting language, but it's because IDA Pro (http://www.hex-rays.com/) has graciously provided a plugin that integrates the Python language, allowing us to run scripts inside of IDA Pro. Remember how I said earlier that the de-obfuscation routine was referenced 179 times? I don't know about you, but I sure don't want to manually copy each obfuscated String into a script and manually comment the resulting value into my IDB file. As such, I looked into using IDAPython.

Now I'll be the first to admit it—I'm probably as far as you can get from being a "Python expert", but I did manage to get through it without too many bumps in the road. The first step was to convert the work I'd done earlier into Python. Overall it didn't prove to be terribly difficult as Ruby and Python share a lot of commonality.

The second step was to find all of the obfuscated strings that get supplied to the de-obfuscation routine. If we look back, we recall that before each call to the de-obfuscation function, there was a 'push offset' call, like so:

I've included the hex representation of each Assembly call, as this information will be important later on. If we look closely at the hex of the 'push offset unk_1008FBB8', we can see that the 'push offset' portion is represented by 0x68. The remaining 4 bytes represent the location of the obfuscated string, in the reverse order (due to the endianness).

Knowing this, we can use the CodeRefsTo() function to determine all of the XREFs to the de-obfuscation function. We can then subtract 4 from each location in order to point to the obfuscated string location. Using the information we gathered earlier regarding the size of the obfuscated string being located at the eighteenth byte, along with knowing that the actual obfuscated segment starts at the twentieth byte, we can obtain the information needed, like so:

Next, we take this data and supply it to the previously created decrypt function. Finally, we take the resulting de-obfuscated string and store it as a comment in the IDB file. For this, we can use the MakeRptCmt() function, to make a repeatable comment. Also, just to keep things clean, let's throw a debug message to the user, letting them know what de-obfuscated Strings were discovered, and where they can be found.

The complete code can be found here.

An example of what it looks like when it is run can be seen below:

In addition to the above functionality, John Miller (@ethackal) provided this wonderful one-liner that will also rename the data blocks to their de-obfuscated string representation. The following code accomplishes this:

At this point we can continue our analysis of the advnetcfg.ocx without worrying about those pesky obfuscated Strings.