BOM Obfuscation in Spam

Spammers try all sorts of tricks to obfuscate, including trying to obfuscate URLs so they cannot be recognized by various URL blacklisting or other scanning services. We recently came across a trick we hadn't seen before.

Here is the original email:


As you can see, the underlying URL at the bottom uses a URL shortening service, in this case, In this case, the URL led to site that was no longer accessible, but was likely some sort of get rich quick scheme. Nothing that unusual . But the HTML code reveals some odd obfuscation.


What is that? Let's look at the hex code:


So we have repeating sets of 0xEF 0xBB 0xBF characters. This is what's known as the Byte Order Mark or BOM, a magic number used to signify the beginning of a Unicode text stream. Now, BOMs are supposed to be only used at the start of a text stream. The BOM is not supposed to be used in the middle of a text stream, but if it is, the Unicode FAQs suggest it gets treated as a zero width non breaking space. In other words, i.e. the HTML renderer in the email client will essentially display the original URL without the BOM characters.

So, use of the BOM is a novel way to obfuscate a URL and other parts of a message, and is no doubt used to try and fool anti-spam devices. The other side of this equation is that it is also trivially easy to detect. Thanks Mr Spammer. The Trustwave Secure Email Gateway now looks for and blocks emails with this obfuscation technique.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.