Trustwave SpiderLabs Uncovers Ov3r_Stealer Malware Spread via Phishing and Facebook Advertising. Learn More

Trustwave SpiderLabs Uncovers Ov3r_Stealer Malware Spread via Phishing and Facebook Advertising. Learn More

Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

Database Security

Prevent unauthorized access and exceed compliance requirements.

Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Microsoft Exchange Server Attacks
Stay protected against emerging threats
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats
SpiderLabs Blog

Unicode Visual Spoofing for Good: Confusable CAPTCHAs

In this blog post, I will show a proof of concept method of leveraging Unicode Visual Spoofing/Lookalikes for use in a CAPTCHA to help prevent automated bots from scraping pages and autosubmitting data.

Unicode Visual Spoofing/Lookalikes

An in-depth discussion of Unicode and the security challenges it poses is beyond the scope of this post, however there are a few salient points to mention. The first of which is the issue of Visual Spoofing. Chris Weber of Casaba Security has an outstanding presentation entitled "Exploiting Unicode-enabled Software" in which he outlines this issue. Here are two applicable points:

Visual Spoofing

  • Over 100,000 assigned characters
  • Many lookalikes within and across scripts


Example IDN Homograph Attack is not www.gooɡ

g = LatinU+0069
ɡ = LatinU+0261

The main issue for security is that, unless data is properly canonicalized before security checks, it is possible for attackers to evade detections. Unicode Visual spoofing can easily be used by criminals in phishing attacks. Even savy Internet users may be tricked into clicking on links at the these Unicode code points are oftentimes visually indistiguishable from one another.


The underlying issue outlined above is that computer programs and humans may interpret Unicode characters differently. We can leverage this issue in our favor if we implement the same concept in a different context - CAPTCHAs.

A CAPTCHA (pronounced /ˈkæptʃə/) is a type of challenge-response test used in computing as an attempt to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are supposedly unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires the user to type letters or digits from a distorted image that appears on the screen.

Here is an example of typical CAPTCHA usage where a graphic is used with obscured text characters displayed:

The user must visually decipher the test and input it into the text box.

Turning the Tables: Visual Spoofing in CAPTCHAs

Rather than using an image file with obscured text in it, the concept presented here is to use Unicode Visually Spoofing/Lookalikes to essentially "trick" the user into entering the text that you desire.

Here is an example Comment form CAPTCHA that implements this concept by adding in an addition field to the end of the form:

            <form method="post" action="" name="comments_form" id="comments-form" onsubmit="if (this.bakecookie.checked) rememberMe(this)">             <input type="hidden" name="static" value="1" />             <input type="hidden" name="entry_id" value="43271" />             <input type="hidden" name="__lang" value="en" />             <input type="hidden" name="parent_id" value="" id="comment-parent-id" />            <div id="comments-open-data">                 <div id="comment-form-name">                     <label for="comment-author">Name</label>                     <input id="comment-author" name="author" size="30" value="" />                 </div>                 <div id="comment-form-email">                     <label for="comment-email">Email Address</label>                     <input id="comment-email" name="email" size="30" value="" />                 </div>                                 <div id="comment-form-remember-me">                     <label for="comment-bake-cookie"><input type="checkbox" id="comment-bake-cookie" name="bakecookie" onclick="if (!this.checked) forgetMe(document.comments_form)" value="1" />                         Remember personal info?</label>                 </div>             </div>             <div id="comments-open-text">                 <label for="comment-text">Comments (You may use HTML tags for style)</label>                 <textarea id="comment-text" name="text" rows="15" cols="50"></textarea>             </div>   <div id="comments-open-footer">                 <!--input type="submit" accesskey="v" name="preview" id="comment-preview" value="Preview" /-->                 <br><label for="challenge_answer">Type the word &#1072;pple below. <strong>(required)</strong>:</label><br /><input type="text" id="challenge_answer" name="challenge_answer" /><br><input type="submit" accesskey="s" name="post" id="comment-submit" value="Submit" />                 </div>         </form> 

This html adds in a new text field called "challenge_answer" where this data will be sent along with the standard POST arguments when the form is submitted to the web app. Notice the highligted text area at the end of the form? It includes an encoded A (Cyrillic) character (&#1072) instead of a Latin small letter "a" to display the word "apple".

Here is how the form would look to user in a web browser:

Screen shot 2011-05-10 at 10.51.39 AM

So the concept is that a malicious SPAM bot program would most likely scrape the raw html above and either insert the raw &#1072 or а (A_(Cyrillic) data into the text field, while a human would insert a normal a (Lating small letter "a") when spelling the word "apple".

Implementation/Validation of Confusable CAPTCHA using ModSecurity

We can implement this Confusable CAPTCHA concept dynamically into forms by using new ModSecurity v2.6 capabilities such as Content Modification.

Enabling Content Modification

In order to dynamically modify outbound response bodies in ModSecurity, you must enable the following two directives:

Modifying Outbound Forms

In order to modify the existing html form data, you can use the following example ModSecurity rules which uses the new @rsub operator which allows for data substitution:

SecRule STREAM_OUTPUT_BODY "@rsub s/<input type=\"submit\"/<br><label for=\"challenge_answer\">Type the word &#1072;pple below. <strong>(required)<\/strong>:<\/label><br \/><input type=\"text\" id=\"challenge_answer\" name=\"challenge_answer\" \/><br><input type=\"submit\"/" \"phase:4,t:none,nolog,pass"

This rule will trap any existing form "Submit" button elements and then prepend our Confusable CAPTCHA data before it.

Validating CAPTCHA Data

We now implement two SecRules to validate the CAPTCHA data.

SecRule REQUEST_FILENAME "@streq /cgi-bin/mt/mt-c.cgi" "chain,phase:2,t:none,block,msg:'Comment Post Error: CAPTCHA Challenge Missing.'"        SecRule &ARGS:CHALLENGE_ANSWER "@eq 0"SecRule REQUEST_FILENAME "@streq /cgi-bin/mt/mt-c.cgi" "chain,phase:2,t:none,block,msg:'Comment Post Error: Invalid CAPTCHA Challenge Answer.',logdata:'%{args.challenge_answer}'"        SecRule ARGS:CHALLENGE_ANSWER "!@streq apple"

These rules check the Comment Form receiving page (/cgi-bin/mt/mt-c.cgi) and then ensure that that the challenge_answer is present and that is contains exactly the word "apple" with a Latin lower case "a". If these checks fail, then the requests will be blocked and alerts generated.

Example alert:

[Tue May 10 08:42:30 2011] [error] [client] ModSecurity: Warning. Match of "streq apple" against "ARGS:challenge_answer" required. [file "/usr/local/apache/conf/crs/base_rules/modsecurity_crs_14_customrules.conf"] [line "9"] [msg "Comment Post Error: Invalid CAPTCHA Challenge Answer."] [data "&#1072;pple"] [hostname ""] [uri "/cgi-bin/mt/mt-c.cgi"] [unique_id "TckytsCoAW0AAB9vOWoAAAAD"]

Confusable CAPTCHA Effectiveness

Keep in mind that this is simply a proof of concept at this point and it has not yet been field tested. This implementation is not meant as a replacement for programs such as ReCAPTCHA. The idea is that this implementation would stop automated programs from scraping your comment form data and auto-submitting SPAM posts. This concept would obviously be circumvented by CAPTCHA answering services as well.

If you decided to field test this concept, we would love to hear from you.

Latest SpiderLabs Blogs

Welcome to Adventures in Cybersecurity: The Defender Series

I’m happy to say I’m done chasing Microsoft certifications (AZ104/AZ500/SC100), and as a result, I’ve had the time to put some effort into a blog series that hopefully will entertain and inform you...

Read More

Trustwave SpiderLabs: Insights and Solutions to Defend Educational Institutions Against Cyber Threats

Security teams responsible for defending educational institutions at higher education and primary school levels often find themselves facing harsh lessons from threat actors who exploit the numerous...

Read More

Breakdown of Tycoon Phishing-as-a-Service System

Just weeks after Trustwave SpiderLabs reported on the Greatness phishing-as-a-service (PaaS) framework, SpiderLabs’ Email Security team is tracking another PaaS called Tycoon Group.

Read More