Finding XSS Vulnerabilities More Quickly with Dynamic Contextual Analysis

Cross-Site Scripting (XSS) has been around since the 1990s and countless scanners have been created to find this vulnerability class. Each scanner has its own set of payloads with some more extensive than others. However, almost all of these payloads are delivered the same way – going down the list. This is essentially a brute force approach which is simple yet effective… sometimes.

For the times when it is not effective, it can be a downright waste of time. Take, for instance, a simple page where the basic payload ( <script>alert('XSS')</script> ) is reflected in a DIV element like so:

<div> &lt;script&gt;alert(&#x27;XSS&#x27s;)&lt;&#x2F;script&gt; </div>

Many tools will verify there is no javascript execution and move on to the next payload until the entire list is exhausted. In this particular case, there is no obvious vulnerability, but it took numerous payloads to arrive at this conclusion. On the other hand, a penetration tester would arrive at the same conclusion after only a few payloads. There isn't a way to bypass the HTML entity encoding while inside an element's content without resorting to other tricks such as the UTF-7 charset.

A Better Way

Instead of blindly flinging payloads at the wall and seeing what sticks, we can analyze the results in the response. We can look at the result of each payload, analyze the context and encoding and then decide what to send next; a feedback loop of sorts. Take the following payloads, for example:

Payloads

Reflections

1

Alphanumeric123

Alphanumeric123

2

Alert(456)

Alert(456)

3

Alert(456)<a>

Alert(456) &lt;a&gt;


We see that the angle brackets are encoded, which means that the next payload in the series (<script>alert(1)</script> ) would most likely not trigger any script execution. That also means an entire category of payloads that rely on angle brackets can be skipped. If the reflection happened inside the attribute of an element, then other payloads such as "javascript:alert(1)" or " onmousemove=alert(1)" are now possibilities.

This more closely approximates what a penetration tester might do. So a scanner technology using this approach can take way less time now that fewer overall payloads are needed to get the same level of test coverage. So what's the big deal when automated scans take up effectively no time since one can simply start a scan and walk away?

The Advantages of Contextual Analysis

Although there is an obvious improvement in speed simply due to the fact that fewer payloads are being sent, there are other advantages as well. With the analysis of context comes the benefit of data that can be used to more accurately determine whether or not a vulnerability exists. Now, questions such as "was the payload encoded?" and "was the payload even reflected at all?" can be answered. For analyzing false positive or false negative cases, this is data invaluable.

It is important to understand that most web applications, especially custom web applications, have their own unique behaviors. There have been many examples in the past of custom filters that have been bypassed by clever obfuscation and other techniques. For these particular cases, the ability to examine the resulting reflection provides a substantial advantage over a traditional payload list. In the following example, we see that the application has two different filtering schemes working in tandem for a parameter that gets reflected in a double quoted attribute.

Payloads

Reflections

1

Alphanumeric123

Alphanumeric123

2

Alert(456)

456

3

Confirm(456)

456

4

Prompt(456)

456

5

window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)

window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)

6

window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)<a>

window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)&lt;a&gt;

7

x" onmousemove="window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)

x" onmousemove="window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)

So, we see that 'alert', 'confirm', and 'prompt' are getting filtered in steps 2-4. However, in step 5, the payload succeeds so it is retained and used in subsequent payloads. In step 6, '<' and '>' are encoded so that eliminates the possibility of escaping out of an attribute using those characters. So, in step 7, we try using an event handler instead and achieve success. The final payload looks like this:

x" onmousemove="window['ale'+(!![]+[])[-~[]]+(!![]+[])[+[]]](456)

It combines two different evasion techniques in order to bypass the two filters. It is not difficult to imagine the vast number of payloads that can be generated from combining multiple evasion techniques. The coverage of a static payload list is but a fraction of what is possible.

Why is this uncommon?

One of the reasons why this method is not commonly used in scanner technologies is due to its complexity. There are many more steps that are required when contextual analysis, dynamic payload generation, and dynamic taint propagation are introduced. Here are some of the steps required in one injection cycle:

- Inject payload (generated from previous injection)

- Find the reflection in the response (part of taint propagation)

- Analyze the surrounding context (HTML/Javascript, element content, tag, or attribute)

- Analyze the reflection (is it encoded/filtered?)

- See if it is worth rendering in the browser by examining the execution context

- Determine what the next payload should be, based on the analyzed reflection and context

- Generate the new payload

So how is the next payload determined from the analyzed results? Wouldn't that require loads of conditionals and constraints? It would, unless a rules engine is used.

The Rules

For those who are unfamiliar with a rules engine, it is primarily used for problem spaces that contain many different kinds of constraints in the form of "If A and B and not C and not D, then X". In programming, this is usually solved with constraint or logic programming paradigms. Luckily for us, a rules engine is perfect for generating new payloads. Here is an example of what 3 rules could look like:

  1. If Alert is reflected and <, > are reflected and they are inside a textarea element, add the textarea closing tag to the payload.
  2. If <, > are not reflected, HTML encode them.
  3. If 'alert' is not reflected, try 'confirm'.

Here's what it would look like in a table when the payload is <script>alert(456)</script>:

Modifications to try

Alert reflected

<, > reflected

In Textarea element

1

</textarea>

Yes

Yes

Yes

2

none

Yes

Yes

No

3

encode <,> with </textarea>

Yes

No

Yes

4

encode <,>

Yes

No

No

5

'confirm' with </textarea>

No

Yes

Yes

6

'confirm'

No

Yes

No

7

'confirm' and encode <,> with </textarea>

No

No

Yes

8

'confirm' and encode<,>

No

No

No

So, for row 1, we would get:

</textarea><script>alert(456)</script>

While for row 7, it may look like:

&lt;/textarea&gt;&lt;script&gt;confirm(456)&lt;/script&gt;

Conclusion

By using a more data driven approach, we work smarter and not harder. In doing so, the additional data and knowledge gained expands our coverage, improves accuracy, and increases efficiency. If we look at other areas of security and beyond, we see the same trend occurring in those as well. So the question is, where else can we apply this methodology?

We've recently updated Trustwave App Scanner Enterprise's XSS detection engine to make full use of the dynamic XSS payloads described above. Existing users will not need to make any configuration changes to take advantage of this new feature. For those users already taking advantage of custom injection strings, a new format has been added to make it easier than before to add your own payloads and extend your coverage capabilities.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.