Trustwave SpiderLabs Uncovers Unique Cybersecurity Risks in Today's Tech Landscape. Learn More

Trustwave SpiderLabs Uncovers Unique Cybersecurity Risks in Today's Tech Landscape. Learn More

Services
Capture
Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

twi-managed-portal-color
Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

twi-briefcase-color-svg
Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

tw-laptop-data
Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

twi-database-color-svg
Database Security

Prevent unauthorized access and exceed compliance requirements.

twi-email-color-svg
Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

tw-officer
Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

tw-network
Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Solutions
BY TOPIC
Microsoft Exchange Server Attacks
Stay protected against emerging threats
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Partners
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats
SpiderLabs Blog

Base64 versus Plaintext Observations

Recently we have been working on the libmodsecurity project. As part of the project we no longer use the Apache Portable Runtime (APR) as part of the core ModSecurity. While this change has allowed us to increase performance, portability, and reliability it means that some aspects of the code that we previously relied on APR for had to be recreated. One such aspect was base64 encoding/decoding support. While developing our base64 implementation I came to ponder an interesting security question. Given an unknown string, such as a password, is it quicker to bruteforce the base64-encoded version of the password or the plaintext version?

This is not an all-together an unreasonable question as one of the most common forms of authentication on the web is still Basic Authentication. There are many reasons for this; chief among them is the simplicity of configuring it. However, generally, Basic Authentication uses Base64 to encode the sensitive information (username, password) as it goes across the wire. Of course base64 is a two way function and as a result our VAT product Trustwave Vulnerability Scanner will detect this as a vulnerability if there is no additional form of communication security, such as SSL.

There is of course some logic behind this question as well. A quick glance at the base64-encoding standard will tell you that it has a 64 character alphabet (hence the name). Meanwhile a quick glance at ASCII will tell you there are 95 printable characters (https://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters). This is really a minimum subset because the astute reader will mention that technically base64 can encode non-printable characters and Unicode; but for now let us only consider printable ASCII.

From these naïve assumptions one might assume that it is significantly easier to brute force characters within the base64-encoded charset. The first assumption, mostly for simplicity, is that any given password is made up of the 95 printable ASCII characters. This assumption means that in order to fully search through a one character password in plaintext, it would take 95^1 (95) iterations. Meanwhile, knowing that Base64 encoding has an alphabet of 64 characters, one could assume that it would take 65^1 iterations to completely try all 1 character password combinations.

Given the above information you may be inclined to try this at home, but these assumptions fail to take into account the nature Base64. Consider Base64 as if it was a hashing algorithm (it's not), the above assumption would be indicative of many, many collisions within the space. But we know that Base64 can be both encoded and decoded quite easily, so how does Base64 encode variable sized input within a smaller character set? The answer is that it uses variable sized - that is to say longer – output. A little bit of background is needed in order to fully understand the previous statement.

Base64 encoding is actually quite simple. Take a simple word like 'Mana'. We first take the ASCII value of each letter 77, 97, 110, and 97 respectively. We then convert these each into their binary representations: 01001101, 01100001, 01101111, and 01100001. When finished we combine these into a single 'bitstring': 01001101011000010110111101100001. Next we take this bit string and break it into 6 bit chunks: 010011 010110 000101 101111 011000 01. We'll notice that 32 is not divisible by 6 and so we pad our last number with zeros until we get a multiple of 6, it then becomes 01000. Our final set appears as follows: 010011 010110 000101 101111 011000 01000. We now do the relatively simple task of converting these back to decimal and looking them up in the table below (19->T, 22->W, 5->F, 46->u, 24->y, and 16->q). We end up with the letters TWFuYQ.

11697_c5f6e315-2082-4979-9732-eecd098b98d2

It is important to note that technically according to the standard if our output is not a multiple of 4 then we will pad with equal signs until we achieve this feature (TWFuYQ==). However, since the decoding process is simply the encoding process in reverse, this isn't strictly needed. Additionally, given an encoding we can very easily determine how much padding is need.

Using the example above we can extrapolate that given an input of length n as input our base64 output length (without padding) would be (ceil(n/3.0)) + n. This now allows us to calculate our given search space more accurately as 64^(ceil(n/3.0)) + n). This pattern means that for every three plaintext additional characters we add one additional base64 bit character. Of course, when we when used as the exponent this gives us a really undesirable feature that after every three characters of input our search space will quadruple (since increasing an exponent by one doubles it).

So we got through a little bit of math. But what does this mean for our original question? It turns out that while when using plaintext we have a larger base (96), the exponent involved with searching a Base64 encoded space grows at a much faster rate. In layman's terms we would never want to search using Base64 versus ASCII.

Below I've visualized what the difference between these search spaces look like for a text of a particular length. When we get to longer passwords these sizes really, for instance searching 32 character text using Base64 encoding you'd have to search ~463167999999998062890000000000000000000000000000000000000000000000000000000000 more combinations. As a result of this I've expressed the graph below on a logarithmic scale. Even here you can see that the differences are marked:

9685_66dd038e-84cb-4ee0-b4b9-d855e564168b

In summary, it makes sense on the surface to assume that Base64 is easier to search than ASCII due to the smaller keyspace. However, as we have shown, keyspace doesn't tell the whole story. Due to the nature of how Base64 encodes strings it is decidedly harder to bruteforce an unknown string encoded in Base64 than it is in ASCII.

Latest SpiderLabs Blogs

Zero Trust Essentials

This is Part 5 in my ongoing project to cover 30 cybersecurity topics in 30 weekly blog posts. The full series can be found here.

Read More

Why We Should Probably Stop Visually Verifying Checksums

Hello there! Thanks for stopping by. Let me get straight into it and start things off with what a checksum is to be inclusive of all audiences here, from Wikipedia [1]:

Read More

Agent Tesla's New Ride: The Rise of a Novel Loader

Malware loaders, critical for deploying malware, enable threat actors to deliver and execute malicious payloads, facilitating criminal activities like data theft and ransomware. Utilizing advanced...

Read More