A data model is the foundation of web application monitoring and, thus, key to successful utilisation of web application firewalls. We don't get to design the model; we can only deduct it from the information provided to us from the underlying technology. What we can do is build on it, and, for that reason, it is very important to understand what we have to work with.
An ideal model is one that helps structure the information available to us, allows us to enrich it with additional pieces of data and generally helps us raise events based on the information it contains.
The major parts of a web application monitoring data model are as follows:
- Connection - corresponds to one TCP connection.
- Request - corresponds to one HTTP request.
- Response - corresponds to one HTTP response.
- IP Address - the IP address of the client, retrieved from the TCP connection.
- Session - application session.
- User - authenticated user; in most cases this translates to the application user, but some sites still use HTTP authentication, and some might use both.
- Site - perhaps more accurately called Protection domain, or Application. None of these terms is perfect, but I generally prefer to use Site. In our model, Site refers either to the functionality behind an entire domain name (e.g. www.example.com), or only a subset of one (www.example.com/forums/).
- Country - the country where the request originates.
- City - the city where the request originates.
- Custom - any number of custom attributes. For example, you might want to have different policies for different departments within your organisation. To achieve this, you will map client IP addresses to department names, which you will then use to determine policies.
Most of the components are easy to construct, mapping from the structures used in programming, but there are a few places where the technology does not support the view, or where what we are given is not what we want to see:
- Some work is needed to be able to distinguish sessions. There are different session identifier techniques to consider (e.g. in the URI, in a parameter, in a cookie). While there is a number of platforms that have standardised session management, there is also a large number of applications using their own schemes, so in general some custom work will be needed.
- More so in the case of user identification. Building on session identification one needs to identify a successful login event in the traffic in order to determine the session username.
- The IP address may not be accurate. It may be that of an intermediary, and not of the client itself. Such cases can sometimes be identified (as is the case with HTTP proxies) , but not always (e.g. if a transparent HTTP proxy is used). The problem is that, unless you control the proxy, you can only rely on the IP address you got from the TCP stack; the information extracted from HTTP requests headers is not to be trusted.
Note: This post is part of the Web Application Firewall Concepts series.