SpiderLabs Blog

Set-based Pattern Matching Example

Written by SpiderLabs Anterior | Jan 2, 2008 7:46:00 AM

 Large Wordlist Example

You will find the greatest benefit of using the set based matching opertors when you have a requirement to look for an extremely large word list in the variable data. A perfect example of this is if you want to search request content for the presence of SPAM keywords or references to known SPAM hosting locations. The GotRoot rule set includes a rule file called blacklist.conf that includes rules that look similar following and has a approximately 7600 individual rules:

SecRule HTTP_Referer|ARGS "best-deals-blackjack\.info" SecRule HTTP_Referer|ARGS "best-deals-casino\.info" SecRule HTTP_Referer|ARGS "best-deals-cheap-airline-tickets\.info" SecRule HTTP_Referer|ARGS "best-deals-diet\.info" SecRule HTTP_Referer|ARGS "best-deals-flowers\.info" SecRule HTTP_Referer|ARGS "best-deals-hotels\.info" SecRule HTTP_Referer|ARGS "best-deals-online-gambling\.info" SecRule HTTP_Referer|ARGS "best-deals-online-poker\.info" SecRule HTTP_Referer|ARGS "best-deals-poker\.info" SecRule HTTP_Referer|ARGS "best-deals-roulette\.info" SecRule HTTP_Referer|ARGS "best-deals-weight-loss\.info" SecRule HTTP_Referer|ARGS "bestdims\.com" SecRule HTTP_Referer|ARGS "bestdvdclubs\.com" SecRule HTTP_Referer|ARGS "best-e-site\.com" SecRule HTTP_Referer|ARGS "best-gambling\.biz" SecRule HTTP_Referer|ARGS "bestgamblinghouseonline\.com"

Let's see the average time that it takes ModSecurity to run through all of these individual rules in phase:2.

# head -3 /usr/local/apache/logs/modsec_debug.log [20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 1: 18 usec [20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Rule 918e140 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 10 usec [20/Jan/2008:02:59:47 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 2: 83751 usec

So, it took 83751 usec to process the ~7600 individual rules. Now, lets run a similar test however this time, we will use the @pmFromFile operator and the input file will have approximately the same number of text lines. Instead of having thousands of individual SecRule lines, I will use this one line:

SecRule REQUEST_HEADERS:Referer|ARGS "@pmFromFile spam_domains.txt"

The spam_domains.txt file contains approximately 6900 lines such as these:

01-beltonen.com 01-klingeltoene.at 01-klingeltoene.de 01-loghi.com 01-logo.com 01-logot.com 01-logotyper.com 01-melodia.com 01-melodias.com 01-ringetone.com

When I run the same test with this new rule that uses the @pmFromFile operator, you can see the dramatic difference in processing time:

# head -4 /usr/local/apache/logs/modsec_debug.log [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 1: 20 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9202980 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 11 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 2: 10 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9203890 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "1"]: 6 usec

As you can see, it only took 6 usec to complete the @pmFromFile set based matching operator check! That is a gigantic improvement for overall performance.

 Conclusion

Set based pattern matching can increase the overall performance of your ModSecurity rules when used in the proper circumstances. Any situation where you need to inspect a large word list, you should try and leverage these new operators