I was asked recently to investigate performance of an ModSecurity installation in order to see if there's room for improvement. This particular installation is used to defend against blog comment spam. It has a large number of simple rules, all regular expressions, that worked against the Referer header and the request parameters. I used the opportunity to figure out an answer to the question that has been bugging me for years: is a configuration with many simple regular expressions faster than a configuration with a single complex regular expression? I'll spare you the details of my tests, here the verdict:
A single rule with a complex regular expression is significantly faster than multiple rules with simple regular expressions.
So instead of:
SecFilterSelective VAR KEYWORD1 SecFilterSelective VAR KEYWORD2 SecFilterSelective VAR KEYWORD3
from now on you want to write:
SecFilterSelective VAR "(KEYWORD1|KEYWORD2|KEYWORD3)"
This is true for ModSecurity 1.9.x and for the last released version of ModSecurity 2.x. I am yet to investigate if there's any room for optimisation in the ModSecurity code, of the difference comes from the overhead in the regular expression library.
(Update 18 August 2006) My post to the mod-security-users mailing lists sparked an interesting discussion. Ryan rightfully pointed out that it is not a good idea to consolidate unrelated rules together as that increases your maintenance costs and prevents you from assigning unique IDs to individual rules. We then went on to discuss potential ways of making the consolidated regular expressions more readable and come up with the idea of using multiple lines together with PCRE-style comments (which only work with Apache 2.x):
SecFilterSelective VAR "(\ KEYWORD1(?# a comment)|\ KEYWORD2(?# a comment)|\ KEYWORD3(?# a comment))"