Stupid Spammer Tricks – Multi-Character Set Text

Looking to refinance your house? Install solar panels? Hey, this email about refinancing (or solar power) looks good. But is it really? Is it legitimate or just spam for a fly-by-night outfit? Spammers are constantly trying new tricks to make an email look legitimate to the recipient while still eluding spam filters. Those filters that rely only (or mostly) on content filters can stop common phrases easily enough, but if those phrases are written using non-standard characters, it becomes much more difficult to allow for all possible characters that look like standard ones. This is one of the latest tricks of spammers. Take a look at this screen shot of a recent email about the government HARP mortgage refinancing program.


At first glance, it looks like a normal ad for the HARP program. But look a little closer at the "A" in HARP. It seems a little… off. Here's the actual HTML code for the phrase "HARP 2.0 refinance program":

HᎪRΡ 2.0 rеfіnаncе рrоgrаm

The 'H' and 'R' are normal ASCII letters, but the 'A' and 'P' are Unicode characters in different character sets (languages). The Unicode character '#x13AA;' is actually the character 'Go' in the Cherokeelanguage, and the Unicode character '#x03A1;' is really the Greek character capital 'rho'. The other Unicode characters are Cyrillic characters that look equivalent to 'a', 'e', 'I', 'o' and 'p'.

Here's another example for solar panels.


This looks mostly reasonable, except for that funky 'Y' in "MONEY". What is it really? Here's the code for "SAVE MONEY":

ѕаvе mоnеу

These Unicode characters are all Cyrillic letters. The 'Y' is a small Cyrillic 'u', which looks like a 'y'. The text is inside a hyperlink tag (<a href…>), which uses the style="text-transform:uppercase;" parameter to change the small Cyrillic letters to uppercase.

Here's one more example, a much more obvious one.

8bit3-2The first character is supposed to be a capital 'A', but the Unicode character they use doesn't quite work as well as the other attempts. This one is the Cherokee letter 'Go' again, same as in the first example. It's more obvious this time because it's subject to different HTML formatting.

If you use spam filtering, you can see why e-mails such as those above may slip through. Content filters can stop phrases like "HARP 2.0 refinancing" just fine, but when those phrases are written using non-standard characters, allowing for all possible look-alike characters in other languages makes it much harder. Fortunately, many anti-spam solutions, including Trustwave's Secure Email Gateway, do not rely solely on content filters, but a multi-pronged approach that also includes measures such as IP and URL reputations, and message structure. For such solutions tricks like these are interesting in that they can help you identify spam on sight--based on the curious text--but are insufficient by themselves to get through to your inbox.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.