We all agree that cross-site scripting is a serious problem, but what continues to amaze me is the lack of good documentation on the subject. It is easy to find instructions how to execute attacks against applications vulnerable to XSS, but finding something adequate to cover defence is a real challenge. No wonder programmers keep making the same errors over and over again. I am sure that one page that describes the problems and the solutions is somewhere out there, but I have been unable to find it. All I am getting is a page after page after page of half-truths and partial information, and even people saying that XSS is impossible to defend against.
Without any planning (so please forgive any omissions), I am now going to write how to produce web applications that are safe against XSS and other injection attacks.
This is what you need to do:
- Identify all system components other than the application itself. In a typical web application you will have at least the following:
- Browser output, which further consists of:
- Response headers (e.g. redirection, cookies, etc)
- Adopt one character encoding (use UTF-8 unless you have a good reason not to) and make sure all components are configured to use it:
- Databases typically need to be created with a character encoding in mind
- In the HTML pages you create, set the character encoding explicitly
- Then, for every component:
- Identify safe characters
- Identify how to make unsafe characters safe by converting them into something else
- Write a function that looks at characters one by one to determine if they are safe, and converts those that are not (whitelisting, not blacklisting!)
- Every such function must be aware of the character encoding used in the application
- Then, for every piece of code that sends data from one component into another, make sure you use the correct function to encode data to make it safe
- Check that every piece of data you receive is in the correct character encoding and that the format matches that of the type you are expecting (input validation). You must use whitelisting (as blacklisting does not work). This is especially important for user-supplied Internet addresses—see below for details. Before you do anything with the input data make sure to canonicalise it (as suggested by Jim Manico in one of the comments), which will reduce the possibility of evasion through the use of multiple representations of the same character.
http://www.example.com, which you use to create a link
<a href="http://www.example.com">Example</a>, you get
- Google Doctype, which is a reference library for web developers, is by far the best resource on XSS, but it too fails when it comes to defence, advising people to use blacklisting instead of whitelisting.
- The OWASP Encoding project should be your starting point if you don't want to write all the encoding function yourself.