ModSecurity Advanced Topic of the Week: Passive Vulnerability Scanning Part 2 - Watcher Checks

In a previous blog post entitled "ModSecurity Advanced Topic of the Week: Passive Vulnerability Scanning Part 1: OSVDB Checks" I presented the concept of using ModSecurity as a vulnerability detection system rather than a traditional attack identification system. In that blog post, I highlighted how you could use ModSecurity's Lua API to integrate the OSVDB CSV vulnerability data as a means to potentially identify vulnerable public applications that you might be using on your site. Remember, the goal of PVS is not to identify attacks, but rather to alert on old software or misconfigurations that should be reviewed.

In this week's blog post, we continue the concept of PVS but this time we are implementing security checks conducted by a community tool called Watcher which is developed by Chris Weber of Casaba Security. Watcher is an add on to the Fiddler browser proxy app. Watcher sits passively and monitors as you, the user, interact with a target website and it will generate alerts based on various misconfigurations that it identifies.

Charset Declaration and Why You Should Care

Why is specifying proper Charset declarations so important? The main issue has to do with attacks targeting browser interpreters in which attackers may encode their payloads in specific ways to evade server-side input validation filters. If the attacker can control how the browser will decode their payloads, it gives them a decided advantage for filter evasion while still having executable code for the browser.

The Browser Security Handbook by Michal Zalewski (Google) has a great section on the various character set handling and detection.

The pitfalls of specific, mutually negotiated encodings aside, any case where server's understanding of the current character set might be not in sync with that of the browser is a disaster waiting to happen. Since the same string might have very different meanings depending on which decoding procedure is applied to it, the document might be perceived very differently by the generator, and by the renderer. Browsers tend to auto-detect a wide variety of character sets (see: Internet Explorer list, Firefox list) based on very liberal and undocumented heuristics, historically including even parent character set inheritance (advisory); just as frighteningly, Microsoft Internet Explorer applies character set auto-detection prior to content sniffing.

Implementing Charset Checks in ModSecurity

Watcher has a number of checks for proper Charset Declarations in response data. The first Watcher Check that we have implemented are for the Charset Checks and are now within the OWASP ModSecurity Core Rule Set, are in the modsecurity_crs_55_application_defect.conf file. We are essentially monitoring outbound text/html data and looking for the following Charset issues.

Missing Charset Declaration

If a site does not specify a charset declaration at all, either in the Content-Type response header or within an html meta http-equiv statement. HtmlPurifier has a good discussion of why missing Charset Declarations are a problem.

Example CRS ruleset:

SecRule &GLOBAL:MISSING_CHARSET "@eq 0" "phase:4,t:none,nolog,pass,id:'981219',setvar:global.missing_charset=0"SecRule GLOBAL:MISSING_CHARSET "@le 10" "chain,phase:4,t:none,pass,id:'981220',log,msg:'Character Set (Charset) Not Specified for Response Content.',logdata:'%{response_content_type}',tag:'WASCTC/WASC-15',tag:'MISCONFIGURATION',tag:'http://code.google.com/p/browsersec/wiki/Part1#Hypertext_Markup_Language'"        SecRule RESPONSE_STATUS "@rx ^2" "chain"                SecRule RESPONSE_HEADERS:Content-Length "!@streq 0" "chain"                        SecRule RESPONSE_BODY "!@rx <meta.*?content=\"text/html; charset=" "chain,t:lowercase"                                SecRule RESPONSE_CONTENT_TYPE "(?i:^text/html;?$)" "setvar:global.missing_charset=+1,expirevar:global.missing_charset=86400"

The rule logic checks to see if a Charset is declared in either the response header or in the html body. If not, then an alert is triggered and an incremental counter is saved in the Global persistent collection. We will only alert on the first 10 events/per day as to not flood the user.

Example alert message:

[Tue May 03 14:26:59 2011] [error] [client ::1] ModSecurity: Warning. Pattern match "(?i:^text/html;?$)" at RESPONSE_CONTENT_TYPE. [file "/usr/local/apache/conf/crs/base_rules/modsecurity_crs_15_customrules.conf"] [line "5"] [id "981220"] [msg "Character Set (Charset) Not Specified for Response Content."] [data "text/html"] [tag "WASCTC/WASC-15"] [tag "MISCONFIGURATION"] [tag "http://code.google.com/p/browsersec/wiki/Part1#Hypertext_Markup_Language"] [hostname "www.example.com"] [uri "http://www.example.com/"] [unique_id "TcBI8sCoqAEAAKXhF7kAAAAC"]

Charset Not Explicitly Set to UTF-8

If a site does specify a charset in one, or even both locations, however they are not set to UTF-8. The Watcher documentation provides the following rationale for this check:

This check identifies HTTP headers, meta tags, and XML documents that don't explicitly set a charset value to UTF-8. UTF-8 is supported in all major Web browsers today, and from a security perspective it is the preferred charset for most Web-applications. When a charset is not explicitly declared, Web browsers are forced into an undesirable content-sniffing mode to determine the content's character set.

Example CRS ruleset:

SecRule &GLOBAL:CHARSET_NOT_UTF8 "@eq 0" "phase:4,t:none,nolog,pass,id:'981221',setvar:global.charset_not_utf8=0"SecRule GLOBAL:CHARSET_NOT_UTF8 "@le 10" "chain,phase:4,t:none,pass,id:'981222',log,msg:'Charset not explicitly set to UTF-8 in Content-Type or HTML/XML content',logdata:'Content-Type Header: %{response_content_type}',tag:'WASCTC/WASC-15',tag:'MISCONFIGURATION',tag:'http://websecuritytool.codeplex.com/wikipage?title=Checks#charset-not-utf8'"        SecRule RESPONSE_STATUS "@rx ^2" "chain"                SecRule RESPONSE_CONTENT_TYPE "(?i:^text/html)" "chain"                        SecRule RESPONSE_CONTENT_TYPE "!@contains charset=utf-8" "chain,t:none,t:lowercase"                                SecRule RESPONSE_HEADERS:Content-Length "!@streq 0" "chain"                                        SecRule RESPONSE_BODY "!@rx <meta.*?content=\"text/html; charset=utf-8" "t:none,t:lowercase,setvar:global.charset_not_utf8=+1,expirevar:global.charset_not_utf8=86400"

Similarly to the previous example, this ruleset verifies that Charset declarations are specifically set to UTF-8.

Example alert message:

[Tue May 03 14:26:59 2011] [error] [client ::1] ModSecurity: Warning. Match of "rx <meta.*?content=\\"text/html; charset=utf-8" against "RESPONSE_BODY" required. [file "/usr/local/apache/conf/crs/base_rules/modsecurity_crs_15_customrules.conf"] [line "12"] [id "981222"] [msg "Charset not explicitly set to UTF-8 in Content-Type or HTML/XML content"] [data "Content-Type Header: text/html"] [tag "WASCTC/WASC-15"] [tag "MISCONFIGURATION"] [tag "http://websecuritytool.codeplex.com/wikipage?title=Checks#charset-not-utf8"] [hostname "www.example.com"] [uri "http://www.example.com/"] [unique_id "TcBI8sCoqAEAAKXhF7kAAAAC"]

Charset Mismatches Between Header and Body Declarations

If there is a mismatched between the charset speficied within the Content-Type response header and any html meta http-equiv statememt, it can cause parsing/rendering problems. The Watcher documentation provides the following rationale for this check:

This check identifies responses where the HTTP Content-Type header declares a charset different from the charset defined by the body of the HTML or XML. When there's a charset mismatch between the HTTP header and content body Web browsers can be forced into an undesirable content-sniffing mode to determine the content's correct character set.

Example CRS ruleset:

SecRule &GLOBAL:CHARSET_MISMATCH "@eq 0" "phase:4,t:none,nolog,pass,id:'981223',setvar:global.charset_mismatch=0"SecRule GLOBAL:CHARSET_MISMATCH "@le 10" "chain,phase:4,t:none,pass,id:'981224',log,msg:'Detect charset mismatches between HTTP header and HTML/XML bodies',logdata:'Content-Type Response Header Charset is: %{tx.charset_header} and HTTP Equiv Charset is: %{tx.charset_body}',tag:'WASCTC/WASC-15',tag:'MISCONFIGURATION',tag:'http://websecuritytool.codeplex.com/wikipage?title=Checks#charset-mismatch'"        SecRule RESPONSE_STATUS "@rx ^2" "chain"                SecRule RESPONSE_CONTENT_TYPE "(?i:^text/html;\s?charset=(.*))" "chain,t:none,t:lowercase,capture,setvar:tx.charset_header=%{tx.1}"                        SecRule RESPONSE_BODY "(?i:<meta.*?content=\"text/html; charset=(.*?)\")" "chain,t:none,t:lowercase,capture,setvar:tx.charset_body=%{tx.1}"                                SecRule RESPONSE_HEADERS:Content-Length "!@streq 0" "chain"                                        SecRule TX:CHARSET_HEADER "!@streq %{tx.charset_body}" "t:none,setvar:global.charset_mismatch=+1,expirevar:global.charset_mismatch=86400"

Example alert message:

[Tue May 03 14:22:03 2011] [error] [client ::1] ModSecurity: Warning. Match of "streq %{tx.charset_body}" against "TX:charset_header" required. [file "/usr/local/apache/conf/crs/base_rules/modsecurity_crs_15_customrules.conf"] [line "20"] [id "981224"] [msg "Detect charset mismatches between HTTP header and HTML/XML bodies"] [data "Content-Type Response Header Charset is: iso-8859-1 and HTTP Equiv Charset is: windows-1252"] [tag "WASCTC/WASC-15"] [tag "MISCONFIGURATION"] [tag "http://websecuritytool.codeplex.com/wikipage?title=Checks#charset-mismatch"] [hostname "www.example.com"] [uri "http://www.example.com/"] [unique_id "TcBHysCoqAEAAKXfF2gAAAAA"]

Testing the Alexa Top 500 Sites

As part of the initial testing of these rules, I decided to use my local Apache+ModSecurity server as a forward proxy. I then downloaded the Alexa Top 1,000,000 Sites CSV file and setup a quick curl script to connect to the top 500 sites through my server. This would allow me to see if any of these sites have any of the Charset issues mentioned above.

Testing Results

Here are the results of the test:

  • Sites that do not specify Charset at all (either in response header or meta http-equiv) = 17
  • Sites that specify a Charset but it is not UTF-8 = 60
  • Sites that have mismatches between Response Header Charset vs. Meta HTTP-Equiv Charset = 8

This data indicates that 17% of the top 500 Alexa site have some type of Charset issue that should be looked into by security staff. How about your site?

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.