To Obfuscate, or not to Obfuscate

Introduction

Malware's goal is to bypass computer defenses, infect a target, and often remain on the system as long as possible. A variety of techniques are used to accomplish these goals. Deciding which of these techniques to use depends on a mix between the skill of the author and the defenses of the intended victim. One of the most widely used tactics in malware is obfuscation. Obfuscation comes in the form of packers, crypters, and string manipulation and when used effectively, it can greatly increase the time and resources needed to analyze a sample.

Trustwave encounters a number of samples that vary in complexity. Typically, the easiest of these are written in scripting languages and compiled with the runtime into an executable. There are many programs and scripts written that can retrieve the original source code from the binary, which, often just leaves reading the source to discover all the juicy bits. Recently, the Spiderlabs Malware Research team encountered a suspicious binary during an investigation that challenged this assumption that all compiled scripts are easy to analyze. The sample was an AutoIT script compiled into a binary that employs string obfuscation, anti-analysis, and anti-security techniques to decode shellcode and a binary obfuscated with ConfuserEx. The shellcode injects the final payload, which is a "Remote Administration Tool" called LuminosityLink. What is interesting about this case is the combination of several fairly simple techniques in a scripting language that greatly increase the difficulty in retrieving the final payload.

The "Quick" Win

It is fairly obvious from the first look at this sample that it is an AutoIT script compiled into a binary. Both the icon and the AutoIT version information identify the sample as such. After extracting the original script from the binary, it was obvious that this script was much more sophisticated then what we typically encouter in scripted malware. The script turned out to be 1400+ lines of heavily obfuscated code and used three different obfuscation techniques. Here are the following techniques that we will highlight in this blog:

  1. String obfuscation
  2. Flow of execution obfuscation
  3. Shellcode/Binary obfuscation

The final payload was also obfuscated with a commerical grade product called ConfuserEx, which is a topic for another blog. We will take a look at the three techniques individually, but it is important to note that these techniques are used in tandem, making the other techniques more effective.

String Theory

This script uses a very simple method to obfuscate strings. Really, the obfuscation occurs when the script is written. A function is used to return the strings to their intended content during execution. The decoding function accepts an integer as an argument, subtracts 2 from it, and returns the value. The return value is passed to the Chr() function, which returns the ASCII representation of the integer. That's it! Here is what the decoding function looks like:

Func _2pdqycz2($string)
Return $string - 2
EndFunc

and here is a call to the function:

Chr(_2pdqycz2(67))

The 67 becomes a 65 after the decoding function, which the Chr function tells us is the ASCII character 'A'. Simple and easy… except there are over 6,000 calls to this function and many of the strings are built after looping through hundreds of lines of code. This won't do. What better way to circumvent their attempts to make analysis difficult then by writing our own script to deobfuscate the code into something more readable. The answer was this regex:

searchStr = 'Chr\(_2pdqycz2\(\d{1,3}\)\)[^\S\r\n]?\&*[^\S\r\n]?'

I can feel you judging me, but luckily this is a blog about getting answers and not writing solid regex expressions! Essentially, the python script reads each line of the script and looks for matches to the regex. The matches that are found are parsed, and the corresponding decoded character is written in place of the function calls. After running the decoding script, the result changes this code:

Global $_v32qyr038 = Chr(_2pdqycz2(104)) & Chr(_2pdqycz2(107)) & Chr(_2pdqycz2(110)) & Chr(_2pdqycz2(103)) & Chr(_2pdqycz2(112)) & Chr(_2pdqycz2(99)) & Chr(_2pdqycz2(111)) & Chr(_2pdqycz2(103)) & Chr(_2pdqycz2(48)) & Chr(_2pdqycz2(103)) & Chr(_2pdqycz2(122)) & Chr(_2pdqycz2(103))

Into this:

Global $_v32qyr038 = "filename.exe"

Justin Case

The easiest thing about analyzing source code is you get to see exactly what is going on. To those that program, it is like reading a book. Choose your own adventures, anyone? The author of this script decided to use case statements wrapped in while loops to obfuscate the order of execution. A global variable is set with a large number and the case statement determines what code is executed before resetting the global variable to another large number. Inside some of these loops lives code that checks for a sandbox environment, security tools, and sleeps for long periods of time. Some of the case statements don't contain code at all and instead simply contain the next jump in the case statement. The case statements contained several hundred lines of code making analysis solutions such as adding print statements to the script or deleting code a dicey prospect at best.

Luckily, AutoIT provides a debugger for their language.

Autoit_debugger

The decoding script left the output with a lot of syntax errors that would have required too much manual fixing to be worthwhile. However, it did make it obvious where the anti-security and anti-analysis checks were. A couple of code deletions and a breakpoint later and we made it to the payload stage of the script.

Bit by Bit

The final stage of this script is really a combination of the first two obfuscation stages, but there is a reason it is listed as the third stage. If anything fails or the malware detects it as an analyst's environment, it is still unknown what the purpose of the malware is. Even running the string decoding script doesn't give much more than a hint that some type of injection is occurring.

The script is now ready to decode the shellcode that will perform the final step. The same string decoding function is used here, except the characters that are deobfuscated are only hexadecimals. Case statements wrapped in while loops continue to obfuscate the control flow while kernel DLLs are loaded and system API's are called. Like many other families of malware, the shellcode allocates memory and maps the payload into the space. A target executable is launched in a suspended state, the memory is removed, the new malicious code is inserted, and the execution begins. This technique is known as Process Hollowing. Once the new code is executing, the script will terminate.

Conclusion

This malware did not perform any groundbreaking new techniques that will forever change the face of security as we know it. It was produced by combining several effective yet simple techniques, an easy to use scripting language, and compiled into an easy to deliver executable. The time and effort required to get the final payload of the malware was significantly higher than what is typically required when access to the source code is available. Malicious scripts compiled into binaries are often on the low end of the sophistication spectrum, but they can still make effective use of known techniques to bypass defenses and perform the task they were set out to do.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.