Advanced Topic of the Week: (Updated) Real-time Blacklist Lookups

Updated - the information in this blog has been updated to reflect the current RBL enhancement added to recently released ModSecurity v2.6.0 and for 2.7 in SVN trunk.

This week's feature is the effective use of Real-time Blacklist lookups (@rbl).

Why use an RBL?

Why use Real-time Blacklist Lookups anyways? How can an RBL be used to help protect a web site? What we are talking about here is IP Reputation. Has this client been identified as bad by other web sites? It is sort of like the "No Fly" lists that the Department of Homeland Security makes available to airlines. It is a method of sharing information about clients so that you can decided if you want to allow this client access to your site at all or perhaps treat them differently (such as with increased logging). Real-time block lists are community-based, central repositories for IP Reputation. RBLs are most commonly used to identify web-based comment spam. If you run a blog or user-forum site, wouldn't you like to know if the current client has already been identified as a spammer?

Reference Manual


Description: Looks up the input value in the RBL (real-time block list) given as parameter. The parameter can be an IPv4 address or a hostname.


SecRule REMOTE_ADDR "@rbl" "phase:1,t:none,pass,nolog,auditlog,msg:'RBL Match for SPAM Source',tag:'AUTOMATION/MALICIOUS',severity:'2',setvar:'tx.msg=%{rule.msg}',setvar:tx.automation_score=+%{tx.warning_anomaly_score},setvar:tx.anomaly_score=+%{tx.warning_anomaly_score}, \

OWASP ModSecurity CRS

The OWASP ModSecurity CRS includes limited use of the @rbl operator within the optional_rules/modsecurity_crs_42_comments_spam.conf file:

# Comment spam is an attack against blogs, guestbooks, wikis and other types of
#   interactive web sites that accept and display hyperlinks submitted by
#   visitors. The spammers automatically post specially crafted random comments
#   which include links that point to the spammer's web site. The links
#   artificially increas the site's search engine ranking and may make the site
#   more noticable in search results.

SecRule IP:PREVIOUS_RBL_CHECK "@eq 1" "phase:1,id:'981137',t:none,pass,nolog,skipAfter:END_RBL_LOOKUP"
  SecRule REMOTE_ADDR "@rbl" "phase:1,id:'981138',t:none,pass,nolog,auditlog,msg:'RBL Match for SPAM Source',tag:'AUTOMATION/MALICIOUS',severity:'2',setvar:'tx.msg=%{rule.msg}',setvar:tx.automation_score=+%{tx.warning_anomaly_score},setvar:tx.anomaly_score=+%{tx.warning_anomaly_score},setvar:tx.%{}-AUTOMATION/MALICIOUS-%{matched_var_name}=%{matched_var},setvar:ip.spammer=1,expirevar:ip.spammer=86400,setvar:ip.previous_rbl_check=1,expirevar:ip.previous_rbl_check=86400,skipAfter:END_RBL_CHECK"

  SecAction "phase:1,id:'981139',t:none,nolog,pass,setvar:ip.previous_rbl_check=1,expirevar:ip.previous_rbl_check=86400"

SecRule IP:SPAMMER "@eq 1" "phase:1,id:'981140',t:none,pass,nolog,auditlog,msg:'Request from Known SPAM Source (Previous RBL Match)',tag:'AUTOMATION/MALICIOUS',severity:'2',setvar:'tx.msg=%{rule.msg}',setvar:tx.automation_score=+%{tx.warning_anomaly_score},setvar:tx.anomaly_score=+%{tx.warning_anomaly_score},setvar:tx.%{}-AUTOMATION/MALICIOUS-%{matched_var_name}=%{matched_var}"


The goal of this ruleset is to run an @rbl check once for each IP address and then save the response in a TX variable for 1 day. This is used to limit the number of @rbl lookups that the web server needs to do as there is a latency hit for executing the DNS queries.

RBLs for Comment SPAM Domains

There are various RBLs which track different things. One interesting RBL for web site owners is which tracks domains that appear in the body of SPAM messages rather than the IP addresses of the clients that are posting the data. This is useful data to correlate with the client REMOTE_ADDR data in ModSecurity in order to better identify comment SPAM postings.

In the latest version of ModSecurity (v2.6.1) we have enhanced the RBL operator to support the following:'s response codes list contains all of the list data, and is the list that we recommend you query to produce your results instead of making seperate requests to each list. If a domain is found on multi, it will return a IP address of 127.0.0.X where X is the value for what list it is on. See the following reference..

X   Binary    On List
2   00000010  black
4   00000100  grey
8   00001000  red
14  00001110  black,grey,red (for testpoints)
255 11111111  your DNS is blocked from querying URIBL

With this enhancement to the @rbl operator, it is now possible to inspect the returned list (black, grey, red, etc...) that a positive lookup returns. This helps to allow users to only execute one RBL looking to a combined RBL list but to then make a local determination for response actions based on the list the IP address is on.

Here is an example rule to test the functionality of inspecting request data for possible SPAM URIs:

SecRule ARGS "https?\:\/\/(.*?)\/" \
    "chain,phase:2,id:'1',t:none,block,capture,msg:'URIBL Match of Submitted Link Domain',logdata:'%{tx.domain}',setvar:tx.domain=%{tx.1}"
        SecRule TX:1 "@rbl" "capture"
                SecRule TX:0 "(BLACK)"

Here is how the processing looks in the debug.log file output:

Recipe: Invoking rule 1009de8d0; [file "/usr/local/apache/conf/crs/activated_rules/modsecurity_crs_15_customrules.conf"] [line "4"] [id "1"].
Rule 1009de8d0: SecRule "ARGS" "@rx https?\\:\\/\\/(.*?)\\/" "phase:2,log,chain,id:1,t:none,block,capture,msg:'URIBL Match of Submitted Link Domain',logdata:%{tx.domain},setvar:tx.domain=%{tx.1}"
Transformation completed in 0 usec.
Executing operator "rx" with param "https?\\:\\/\\/(.*?)\\/" against ARGS:note.
Target value: ""
Added regex subexpression to TX.0:
Added regex subexpression to TX.1:
Operator completed in 20 usec.
Setting variable: tx.domain=%{tx.1}
Resolved macro %{tx.1} to:
Set variable "tx.domain" to "".
Rule returned 1.
Match -> mode NEXT_RULE.
Recipe: Invoking rule 1009df8f0; [file "/usr/local/apache/conf/crs/activated_rules/modsecurity_crs_15_customrules.conf"] [line "5"].
Rule 1009df8f0: SecRule "TX:1" "@rbl" "chain,capture"
Transformation completed in 1 usec.
Executing operator "rbl" with param "" against TX:1.
Target value: ""
Added phrase match to TX.0: RBL lookup of succeeded at TX:1 (BLACK).
Operator completed in 1441 usec.
Rule returned 1.
Match -> mode NEXT_RULE.
Recipe: Invoking rule 1009dfeb0; [file "/usr/local/apache/conf/crs/activated_rules/modsecurity_crs_15_customrules.conf"] [line "6"].
Rule 1009dfeb0: SecRule "TX:0" "@rx (BLACK)"
Transformation completed in 0 usec.
Executing operator "rx" with param "(BLACK)" against TX:0.
Target value: "RBL lookup of succeeded at TX:1 (BLACK)."
Ignoring regex captures since "capture" action is not enabled.
Operator completed in 12 usec.
Resolved macro %{tx.domain} to:
Warning. Pattern match "(BLACK)" at TX:0. [file "/usr/local/apache/conf/crs/activated_rules/modsecurity_crs_15_customrules.conf"] [line "4"] [id "1"] [msg "URIBL Match of Submitted Link Domain"] [data ""]

Caution: the @rx operator will only extract the first match from the payload. This means that if there are multiple URIs in a Comment Spam request, it will not check all URIs. It is for this reason that you might want to utilize a Lua script to more accurately parse URIs and then create TX variables that can then be inspected by @rbl.

HTTP Blacklist

Another very useful RBL site for web site defense is the HTTP Blacklist run by Project Honeypot. They describe the HTTP BL as follows:

The HTTP Blacklist, or "http:BL", is a system that allows website administrators to take advantage of the data generated by Project Honey Pot in order to keep suspicious and malicious web robots off their sites. Project Honey Pot tracks harvesters, comment spammers, and other suspicious visitors to websites. Http:BL makes this data available to any member of Project Honey Pot in an easy and efficient way.

Http:BL provides data back about the IP addresses of visitors to your website. Data is exchanged over the DNS system. You may query your local DNS server and receive a response back that indicates the type of visitor to your site, how threatening that visitor is, and how long it has been since the visitor has last been seen within the Project Honey Pot trap network.

This is useful data as it tracks IP address of clients who have been flagged as malicious by the Project Honeypot's trap network which means that there is a very low chance of false positives.

In the latest ModSecurity version in SVN trunk (2.7), one of our ModSecurity Developers (Brian Bebeau) added the capability to use the Http:BL API by allowing the ModSecurity user to specify their registered API key with the new SecHttpBlKey directive.


Description: Configures the user's registered Honeypot Project HTTP BL API Key to use with @rbl.

Syntax:SecHttpBlKey [12 char access key]

Example Usage:SecHttpBlKey whdkfieyhtnf

Scope: Main

Version: 2.7.0

If the @rbl operator uses the RBL ( you must provide an API key. This key is registered to individual users and is included within the RBL DNS requests.

You can then use rules similar to the following to check the client IP address against the HTTP BL:

SecHttpBlKey whdkfieyhtnf
SecRule TX:REAL_IP|ARGS:REMOTE_ADDR "@rbl" "chain,phase:1,t:none,capture,block,msg:'HTTPBL Match of Client IP.',logdata:'%{tx.httpbl_msg}',setvar:tx.httpbl_msg=%{tx.0}"
        SecRule TX:0 "threat score (\d+)" "chain,capture"
                SecRule TX:1 "@gt 20"

If a malicious client (such as connects to your web server, this rule will inspect the "threat score" data returned by the HTTP BL and then it will trigger an alert if it is above the defined threshold limit (20 here). An example alert would be generated and the client would be blocked (depending on your configuration).

[Wed Jul 20 15:53:26 2011] [error] [client] ModSecurity: Warning. Operator GT matched 20 at TX:1. [file "/usr/local/apache/conf/crs/activated_rules/modsecurity_crs_15_customrules.conf"] [line "2"] 
[msg "HTTPBL Match of Client IP."] [data "RBL lookup of succeeded at REMOTE_ADDR. Suspicious comment spammer IP: 1 days since last activity, threat score 63"] [hostname "localhost"] [uri "/cgi-bin/printenv"] [unique_id "TicyNsCoAWsAAKslGIAAAAAB"]

RBL Usage Considerations and Tips

While @rbl is a useful feature, there is a caution with its usage - it is a severe performance hit and can cause increased latency for clients. Whereas the @geoLookup operator accessed a local DB, @rbl checks occur in real-time over the network and utilize the DNS infrastructure. For the same reason that most web admins disable real-time client resolution in logging, running a DNS lookup on each client request can cause severe delays.

Choose your RBL carefully

Make sure that you choose your RBL carefully. You not only want to ensure that the RBL category is appropriate for your site but also that the accuracy of the list is good.

DNS Caching

Implement a local caching DNS server like rbldnsd or Christian Bockermann's jwall-rbld so that your @rbl checks issue DNS queries to the local system first.

Use ModSecurity Persistent Storage

Alternatively, you can use ModSecurity to save rbl responses in the IP persistent storage collection. This is what the CRS modsecurity_crs_42_comment_spam.conf file does. The persistent data is cached for 1 day.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.