Every Tool in the Tool Box

Introduction

When I teach people about reverse engineering, I often hear the following statement: "I got the right answer, but I cheated to get it". They are typically talking about using dynamic analysis to get an answer versus statically analyzing the binary, but my response to this statement is always the same; There is no such thing as cheating in reverse engineering as long as you get the correct answer. Dynamic analysis, static analysis, tools, scripts, Google, and everything in between are all valid paths to analyzing a binary. When it comes down to it, yes, static analysis gives you the highest confidence that all code paths are fully understood. However, the path to getting the correct answer is not always the straight and narrow one. New reverse engineers tend to see a sample like the old adage: "When you only have a hammer, every problem looks like a nail". Knowing the tools that you have available and using the correct one can often save you considerable time and effort.

Background

Trustwave has been documenting the Alina malware family and its versions since 2013. In 2014, we posted a blog about the Spark variant along with a script for decoding the traffic produced by this sample. Shortly after the blog was released, a colleague that had acquired Alina's source code, which is now publicly available on Github, contacted me. At that point we had a good understanding of Spark, but I accepted it and saved it in case it was ever needed in the future. Earlier this year, Trustwave encountered the Joker v1.8 and Eagle v1.1 families, which are based on Alina's source code.

The Joker variant added a significant amount of anti-analysis checks as well as a simplistic rootkit for hiding the file and process. Cisco did a write-up that describes these two features in detail. The Eagle variant was also well detailed in a write-up by Nuix and added features for stealing VNC credentials. However, the malware had not changed dramatically for these related families and reversing the samples was a straightforward process.

The Right Tool

I performed my initial triage of the samples as usual to look for details that could lead to quick answers. I googled the hash, ran strings, and looked at the PE info for anything that might stand out as unique before dropping the binary into IDA. Because of my previous knowledge of the malware families, I recognized that this was an Alina variant. Using that knowledge, I began looking for additional information on what these particular variants were, which led me to the two blogs listed above. Armed with my previous experience, the new information, and the work I currently had for the samples, I was able to get the vast majority of the details required for the forensic team in a short period of time. In this line of work, this is considered a quick win. Next, I verified the details of my samples against the existing information to ensure their accuracy and was able to write a detailed analysis of a sample. My last step was to ensure that my decoding script worked properly with the traffic produced by the newer variants. When I ran the script against traffic generated by the Joker family, I received the following error:

 

Spark_error
Spark script error

 

The actual decoding worked, but parsing through the resulting header information caused an issued because of some of the assumptions I made in how the header information was formatted. At this point I remembered that I had received the source code for a related version of Alina and it could be used to write a more robust decoding script. Looking at the code, I found exactly what I was looking for in one of the PHP files that handled incoming traffic:

 unpack("Sv/a16sv/a8hwid/C2n/a8act/a32pcn/Lsize/C4s", $data);

The PHP unpack function uses the format codes to parse the header information for the C&C communication. It unpacks the information into 8 header fields:

  • v -> version number
  • sv -> version name
  • hwid -> bot ID
  • n -> null pad
  • act -> action
  • pcn -> pcname
  • size -> size of data
  • s -> signature

Without source code, or other variants to compare to, I assumed some of the headers were hard coded strings. This caused an error when attempting to parse the traffic of the related variants that used the same traffic structure. I created a  new python script(Sorry Ruby people, Python is just better :] ) to properly parse the header information and work with all of the related families and variants of Alina.

I was bolstered by this much-improved script, so I decided to spend some time attempting to answer a nagging question that couldn't be answered with just the malware. The various versions contain formatted numbers that are obviously used to send information back to the C&C server. Here is an example of some of those strings:

 

Eagle obfuscated C&C strings
Eagle obfuscated C&C strings

 

The Joker family used a variation on the format, replacing {[! !]} with ~j~ ~k~ but maintained the numbers and their corresponding meaning. These numbers showed up in the traffic and I wanted my script to be able to properly decode this information. Luckily, the source code showed the statements that these numbers represented. As a result the decoding script now properly displays the traffic as it is seen on the C&C panel:

Joker_traffic
Joker de-obfuscated traffic

 

PHP server side code parses the brackets (ex: [:112 <12>]) at the beginning of each line for coloring and highlighting.

Conclusion

There are many paths to understanding the capabilities of a sample of malware. Sometimes the family is known and the available information can aid in verifying the sample being analyzed, preventing the duplication of work and often saving a lot of time. It is still crucial to confirm the details, but it is much easier to confirm existing details than to perform the original analysis. Other times information is gained that is not possible to get out of the binary itself, such as determining what the bracketed codes actually meant in the Alina . It is important to not get tunnel vision during analysis, but always be open to finding new information and expanding your resources. This is why Trustwave continues to publish write-ups for unknown malware we encounter and resources to assist the security community in understanding and protecting against ongoing threats. Remember, There is no such thing as "cheating" if you get the right answer.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.