ModSecurity 2.5 Phrase Match Operator Performance

Quite a few people have asked about the performance differences between using the regular expression (@rx) operator and using the phrase match (@pm or @pmFromFile) operator. Lately, I have been working on better methods of gathering performance statistics and want to share my findings.

The phrase match operator was added to enhance performance of matching a larger number of static phrases. For instance, you may want to look for a list of spam phrases in ARGS:

With the @rx Operator and Simple OR Expression:

SecRule ARGS \
"@rx ambien|cyclen|cyclobenzaprine|paxil|phendimetrazine|phentamine|phentermine|viagra|viagara" \
"phase:2,deny,status:403,log,alertlog,t:lowercase,msg:'Medical Spam Detected'"

With the @rx Operator and an Enhanced Expression (see Optimizing Regular Expressions):

SecRule ARGS \
"@rx (?:p(?:hen(?:t(?:er|a)m|dimetraz)ine|axil)|cycl(?:obenzaprine|en)|viaga?ra|ambien)" \
"phase:2,deny,status:403,log,alertlog,t:lowercase,msg:'Medical Spam Detected'"

With the @pm Operator:

SecRule ARGS \
"@pm ambien cyclen cyclobenzaprine paxil phendimetrazine phentamine phentermine viagra viagara" \
"phase:2,deny,status:403,log,alertlog,t:lowercase,msg:'Medical Spam Detected'"

To compare the performance of each of these, I used a utility I build for unit testing to execute the operators 10,000 times and took the average execution time. I generated a randomized list of 1,000 phrases between 2 and 8 characters in length. The following chart compares the processing time in milliseconds of each of the above operator types using from 0 to 200 phrases from the randomized phrase list. Each operator uses the same set of phrases. Note that this is the processing time of only the operator (no overhead of transformations, alerts or other aspects of executing the rule).


As you can see, there is quite a difference in performance. The basic @rx operator performance decreases linearly as more phrases are added. While the optimized @rx operator performance does come close to leveling out, it is still slower than the @pm operator and the rule itself is quite difficult to read and maintain. In contrast, the @pm operator uses a constant, extremely low amount of processing time while the rule is easy to read and maintain.

While the @pm operator performs well, it is not very flexible and thus has limited use cases. The operator can only use static phrases (no patterns) and cannot currently be anchored (meaning it will match a partial target and/or a partial phrase in the target). If you need patterns or need to anchor matches to the beginning and/or end of the target or word boundaries, then you still must use the @rx operator. In this case an optimized regular expression is the way to go if you need the rule to perform well, which is why these are used in the Core Rules.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.