All Your Base64 Are Belong To Us – Dynamic vs. Static Analysis of Web Content

I recently encountered an interesting phishing scheme when reviewing telemetry of incidents blocked by Trustwave Secure Web Gateway (SWG). My investigation into the scheme uncovered some interesting points and led me here:

  Fig1

Figure 1: Phishing content only blocked by 1/67 URL scanning engines

As you can see in the image above, Trustwave SWG blocked this sample when other engines had a hard time detecting it. This blog will explain in detail why and how I ended up with the above.

Dynamic Analysis - More Than Just a Pretty Phrase

The short answer to the 1/67 detection above is real-time dynamic analysis. Despite a lot of talk about innovative next-gen technologies, the unfortunate reality is that a lot of web-based protection is still heavily reliant on classic technologies like URL blacklisting, URL reputation, and static analysis of web content.

The problem with these approaches is that web content is dynamic and constantly changing. The content of a URL you browse to now is not identical to the content you receive if you browse to the same URL in 5 minutes, or from another country, or, in today's world of embedded advertising, even if you refresh the page. This leaves a lot of room for false positives and false negatives when trying to use static technologies on such heavily dynamic content.

Our approach to this issue is dynamic analysis for dynamic content. But does it really matter which technology an engine employs as long as malicious content is blocked and benign content is not? Let's take a closer look.

This is the original malicious URL from the image above, which was uploaded to VirusTotal a while ago:

  Fig2

Figure 2: Original URL scan results on VirusTotal

So originally this URL was blocked by 5/67 engines. Not exactly the numbers you want to see for a phishing page, but it's something.

We can see that these scanners are using blacklisting here, because if I just play with the URL a little bit…

  Fig3

Figure 3: A scan of the same domain with a different path

As you can see, Trustwave SWG didn't block it, simply because the web content wasn't malicious this time. In fact, there was no content there at all except for a '404 Not Found' page, since the new URL is fictitious. Granted, that domain may not have the best of reputations, but if we were being nitpicky this would be considered a false positive.

Interestingly enough, this last figure is the exact inverse of the first figure in this blog. This is because in the first figure what I did was take the original content from this page and upload it to a new, clean domain. Trustwave SWG blocked it because regardless of the reputation of the domain, the content was malicious. And that's all that really matters when using dynamic analysis.

Now that we've established the strength of dynamic analysis, let's take a closer look at this nice phishing scheme.

  Fig4

Figure 4: Microsoft phishing page

Fig5

Figure 5: Source code of the phishing page

So it all starts with simple script that change the window location to "data:text/html;base64…"

The concept of fileless base64phishing pages isn't new. But without dynamic analysis of the content, base64 isn't inherently malicious or clean. It is simply some encoded content. Let's decode this page and take a closer look:

  Fig6

Figure 6: Decoded base64 code of the phishing page

Voila! A completely new HTML page with the phishing scam itself, but wait: The new page has "my" email address inside - "you-cant-hide-from-SWG@Trustwave.com".

How did the attacker do this? Base64 again! But this time in the URL itself.

If you look closely at Figure 2, you can see that the URL of this page was: hxxp://office0000.cf/sect/7nyyil7yn459fe1da6d9c2d/5a959dbac0a39/eW91LWNhbnQtaGlkZS1mcm9tLVNXR0BUcnVzdHdhdmUuY29t

If you base64 decode the last part of the URL:
eW91LWNhbnQtaGlkZS1mcm9tLVNXR0BUcnVzdHdhdmUuY29t

You will get:
you-cant-hide-from-SWG@Trustwave.com

Amazing! The attacker managed to bypass almost all security products by utilizing both techniques I mentioned in the beginning: Changing the content of the page every time to avoid static analysis, and using a clean domain to avoid reputation-based detection.

For every address to which the phishing email is sent, there is a unique URL and unique content being generated using base64. This alone makes static analysis of the content insufficient.

The main obstacle our attacker runs into with static analysis is the blacklisting of their domains. But as shown at the beginning of the blog, simply put the same content on a new domain - and the problem is solved. Also, since the real phishing page is simply base64 code, it can easily be put anywhere, from a newly created clean domain to a compromised server with perfectly good reputation.

Doing some google-fu revealed that this phishing scheme has been going on for a while, and the attacker has, in fact, changed the domains a few times already in order to avoid detection.

  https://phishcheck.me/14811/detailssays a URL with similar structure with the IP 54.243.65.67 was seen on Feb. 6, more than a month ago.

Let's see our page:

  Fig7

Figure 7: Resolving the IP of our phishing domain

A perfect match.

  Fig8

Figure 8: PhishCheck shows that the URL is likely phishing, hosted on Amazon

So what is a "next-gen" blacklisting solution likely to do when an attacker keep changing domains, but stays on the same IP? Block the IP address instead of domains…

Unfortunately, this IP belongs to Amazon. I won't get into the details of how Amazon assigns IP addresses and why it is bad idea to block IPs on Amazon. Instead I will take a look at what else is potentially hosted on this IP address.

  Fig9

Figure 9: PassiveDNS results for the IP

Looks like this Amazon IP hosts a bunch of Heroku services, a legitimate cloud based platform for application development. This is why blacklisting IP addresses altogether can trigger false positives. In a multihomed environment like this, most of what is hosted on one IP can be completely legit.

All of this shows the limitations of static analysis and the strength of dynamic analysis. With dynamic analysis, we are able to block pages that contain real malicious content without false positives and without the need to manually add domains/IPs/etc. into blacklists.

Back to Base64

After all of this you may have noticed that the URL in PhishCheck (Figure 8) is a little longer than the one in our example. Let's break it down:

First we have the base domain with some random gibberish directories: hxxp://strnet24.gq/sect/iv6cwjbhfw59ae2de1846a0/5a79d3864d65f/

This is followed by the targeted email address in base 64: cnN0dWNrZXlAdGhlY2FybHlsZWdyb3VwLmNvbQ==

Which is, in turn, followed by several parameters. The first is:
?forced=1
which would be a flag for some unknown use by the authors behind this campaign
 
The next parameter is:
&tg=VVNBMzY1
this is base64 for "USA365". Why? If we could ask the authors we would.

The final parameter is also in base64: &s=ZXlKcGRpSTZJbGg0V2tveFZFUnljME5FVlZWbmRFaGlLM0ZOZFdjOVBTSXNJblpoYkhWbElqb2lhV05qYkZVMU1GQXhWek0yU2t4UGRXRnZXVU5TY205SmFVUlBhSGM1VGpjck0xVnRaVE5pWjJScVZFbzNiRzljTDBSSFUwNVhhekZhVmpGaU5EUkNWa05CTnpSVGNsUnplbXBxU1hGbGFTdFVXV3RXTkdGVGNsQnRZbXRoUTNOYVFscHZRVWhHVjA5M09VdzVUa1F5T0RoVGVtVkdiRTAxV0ZWYWEyZEljazQxVWt3cmVrTjNUMHB3UzNSVE0wcFhRazF6WVRoQ1VUMDlJaXdpYldGaklqb2lNek13WmpCa01ESTVOVEZtWlRBeE1UTTJZak0yWXpNeFltUTFPR1ExWm1Zd056YzBNbU14TlRRME1qYzBNelJqTXpreU9XRmpZVGN5TXpVek4yTTJPQ0o5

However, decoding it finds us with another base64 encoded string code:

eyJpdiI6Ilh4WkoxVERyc0NEVVVndEhiK3FNdWc9PSIsInZhbHVlIjoiaWNjbFU1MFAxVzM2SkxPdWFvWUNScm9JaURPaHc5TjcrM1VtZTNiZ2RqVEo3bG9cL0RHU05XazFaVjFiNDRCVkNBNzRTclRzempqSXFlaStUWWtWNGFTclBtYmthQ3NaQlpvQUhGV093OUw5TkQyODhTemVGbE01WFVaa2dIck41UkwrekN3T0pwS3RTM0pXQk1zYThCUT09IiwibWFjIjoiMzMwZjBkMDI5NTFmZTAxMTM2YjM2YzMxYmQ1OGQ1ZmYwNzc0MmMxNTQ0Mjc0MzRjMzkyOWFjYTcyMzUzN2M2OCJ9

  Saltbase

Figure 10: More base64

Well, it's no problem to run another decode cycle, and we finally find the following JSON:

{"iv":"XxZJ1TDrsCDUUgtHb+qMug==","value":"icclU50P1W36JLOuaoYCRroIiDOhw9N7+3Ume3bgdjTJ7lo\/DGSNWk1ZV1b44BVCA74SrTszjjIqei+TYkV4aSrPmbkaCsZBZoAHFWOw9L9ND288SzeFlM5XUZkgHrN5RL+zCwOJpKtS3JWBMsa8BQ==","mac":"330f0d02951fe01136b36c31bd58d5ff07742c154427434c3929aca723537c68"}
 
Again, we can only speculate as to what these values mean. Perhaps they provide the authors with information to uniquely identify their target.

Quick PSA folks: base64 is not an encryption algorithm, and Trustwave SWG customers are protected against such trickery.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.