Trustwave SpiderLabs Exposes Unique Cybersecurity Threats in the Public Sector. Learn More

Trustwave SpiderLabs Exposes Unique Cybersecurity Threats in the Public Sector. Learn More

Services
Capture
Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

twi-managed-portal-color
Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

twi-briefcase-color-svg
Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

tw-laptop-data
Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

twi-database-color-svg
Database Security

Prevent unauthorized access and exceed compliance requirements.

twi-email-color-svg
Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

tw-officer
Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

tw-network
Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Solutions
BY TOPIC
Offensive Security
Solutions to maximize your security ROI
Microsoft Exchange Server Attacks
Stay protected against emerging threats
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Partners
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats
SpiderLabs Blog

ChatGPT: The Right Tool for the Job?

Since it was first released to the public late last year, ChatGPT has successfully captured the attention of many. OpenAI’s large language model chatbot is intriguing for a variety of reasons, not the least of which is the manner in which it responds to human users. ChatGPT’s language usage resembles that of an experienced professional. But while its responses are delivered with unshakeable confidence, its content is not always as impressive.

Before proceeding to the research results, it is important to understand that artificial intelligence systems encompass a variety of different techniques and technologies to make decisions and predictions. Without a full understanding of the technologies behind OpenAI’s chatbot, it is impossible to make a truly accurate assessment about the scope of its capabilities. In lieu of that, the best that we can do is treat it as a black box and assess its responses to various prompts.

For this blog, we tested ChatGPT’s ability to perform basic static code analysis on some vulnerable code snippets. At first glance, the responses it delivered were astounding, but as with any good research, it was necessary to scratch the surface to see its true value.

Figure 1. The vulnerable code refactored using dynamic memory allocation.

Buffer Overflow in C

The first piece of code tested was a simple buffer overflow example. ChatGPT quickly determined that it contained a vulnerability due to the length of the string printed exceeding the size of the fixed-length buffer. When asked to categorize, it assigned a CVSSv2 vector of AV:L/AC:L/Au:N/C:P/I:P/A:P and labeled it as CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer. When asked how to make the code more secure, ChatGPT suggested increasing the size of the buffer, using a more secure function (snprintf in place of sprintf), or dynamically allocating memory for the buffer based on the length of the string. It recognized the limitations of assigning these labels without further context. More impressively, ChatGPT could refactor the code based on any of the fixes that it suggested – for example, using dynamic memory allocation.

Going further, we asked ChatGPT to show how an attacker could exploit this vulnerability, and it did not disappoint. Since we did not give it any constraints, it updated the code snippet with a variable the size of the buffer and a shellcode to echo ‘Hello World.’ First, the code would have to be compiled in order to execute it. Although this safeguard can be easily disabled, modern compilers like GNU Compiler Collection (GCC) can prevent code from writing explicitly declared variables into smaller buffers acting as a primary safeguard to avoid such issues. When asked how it would exploit the vulnerability after the code was compiled, ChatGPT generated a python script with the same shellcode.

19748_image002

Figure 2. Python script with the same shellcode.

Most interestingly, when asked how to exploit the code from a Linux shell, ChatGPT provided detailed step-by-step instructions on how to use objdump, gdb and msfvenom to generate the payload.

19749_image004

Figure 3. Step-by-step instructions on how to use objdump, gdb and msfvenom to generate the payload.

It could have used a simpler method, but knowing the offset of the buffer, the AI took a much more aggressive approach by spawning /bin/sh, declaring:

‘This effectively gives the attacker a command prompt with root privileges.’

This would be true if the vulnerable code were running with root privileges already, which is not always the case. When asked why it is assumed that the user is root, ChatGPT explained that the user will be inherited from the parent process running the code.

‘I apologize for any confusion.  […] in a realistic scenario, the user who executes the shellcode may not have root privileges.’

Cross-Site Scripting in JavaScript/PHP

The second code we tested was an example of DOM-based Cross-Site Scripting (XSS). This time, ChatGPT would not commit to a particular category as it found multiple issues, including XSS, if the value of ‘name’ is not properly sanitized. When asked specifically about the XSS, it gave a CVSSv2 vector of AV:N/AC:M/Au:N/C:P/I:P/A:N and labeled it as CWE-79: Cross-Site Scripting (XSS) or CWE-494: Download of Code Without Integrity Check. Additionally, ChatGPT suggested multiple ways to improve the security of this code, including sanitizing user input using certain PHP functions, running the code in strict mode, and using a content security policy (CSP). Once again, it was able to quickly refactor the code to make it more secure, describing the changes that it made.

19750_image008

Figure 4. ChatGPT found multiple issues with very little context.

What was interesting about the refactored code is that ChatGPT wrote new functions to implement these changes rather than just expanding the existing code block. This suggests that ChatGPT is at least aware of some basic software development practices and when to apply them.

Code Execution in Ruby

The last code we tested was a potential code execution in Discourse’s AWS notification webhook handler. ChatGPT first described the code as shown below:

‘This code appears to be a Ruby script that defines a Sidekiq job for confirming an Amazon SNS subscription. The purpose of this job is to verify that the SNS message received is authentic and then confirm the subscription by visiting the SubscribeURL.’

Although this description is a bit vague, it is a good enough starting point to describe the code. ChatGPT also provided a line-by-line breakdown of the code. The descriptions of each line describe the methods being used and how they relate to their parameters, and while they can be helpful, ChatGPT could not infer any more context. Interestingly, when supplied with the same code with the variables and identifiers changed, it provided an almost identical response, suggesting that it was doing more than just looking up definitions based on naming conventions.

Initially, ChatGPT did not find any issues with this code:

‘Overall, this code appears to be secure and well-written, using appropriate error checking and an AWS SDK to verify the authenticity of the SNS message.’

ChatGPT finding no issues with this code is not surprising since exploitation of this code relies on creating a custom endpoint and injecting a crafted X509 certificate. While the code itself is not obviously problematic, ChatGPT is a little too trusting with user-supplied input that could result in code execution when calling the ‘open’ method under the right conditions. The complexity of this vulnerability is much higher than that of the previous examples and requires an understanding of the AWS SDK of which to take advantage. However, when specifically asked about security issues, ChatGPT was able to identify the root causes of the vulnerability, even if it did not know the exact method of exploitation.

19751_image010

Figure 5. Descriptions of the changes that ChatGPT made when asked to improve its security.

Though these insights are basic, they can be helpful when finding vulnerabilities. In this instance, the code block was quite small – only 16 lines without whitespace – but that is not often the case. When trying to find and/or exploit a vulnerability, the starting point is often much broader in scope. Additionally, since any block of code often contains multiple issues, an elegant description of problematic elements can help filter out the noise when searching for the cause of a particular behavior.

When starting with a large block of code, an AI-powered static analysis tool could be valuable in helping researchers reduce the amount of time and effort required to narrow the search.

The Right Tool for the Job?

Although only a few tests are highlighted here, we provided ChatGPT with a lot of code to see how it would respond. It often responded with mixed results. With the three examples above, it did quite well finding potential issues. These examples were chosen because they are relatively unambiguous, so ChatGPT would not have to infer much context beyond the code that it was given.

To get the most out of ChatGPT, it is important to be as clear and specific as possible. When supplying it with larger code blocks or less straightforward issues, it did not do very well at spotting them, but that is no less true about humans trying to do the same job.

Although static analysis tools have been used for years to identify vulnerabilities in code, they have limitations in terms of their ability to assess broader security aspects – sometimes reporting vulnerabilities that are impossible to exploit. ChatGPT demonstrates greater contextual awareness and is able to generate exploits that cover a more comprehensive analysis of security risks. The biggest flaw when using ChatGPT for this type of analysis is that it is incapable of interpreting the human thought-process behind the code.

For the best results, ChatGPT will need more user input to elicit a contextualized response detailing what is required to illustrate the code’s purpose.

The responses that ChatGPT delivers are not always accurate. In OpenAI’s defense, ChatGPT’s purpose is to simulate human conversation, and, in that regard, it is wildly successful. As a tool specifically designed to generate chat, it should be no surprise that ChatGPT is particularly good at writing clear and concise responses. Additionally, ChatGPT could be particularly useful for generating skeleton code and unit tests since those require a minimal amount of context and are more concerned with the parameters being passed - another thing at which ChatGPT excelled in these tests.

It is flexible enough to be able to respond to many different requests, but it is not necessarily suited to every job that is asked of it. That said, ChatGPT provides a real source of intrigue for more specialized AI-powered tools in the future. Still, as with any exciting new tool, it is necessary to push the limits to find the most suitable use cases. There is already a variety of ChatGPT-powered plugins for different software from Integrated Development Environments (IDEs) to disassemblers, so it is only a matter of time until some applications rise to the top.

 

Latest SpiderLabs Blogs

2024 Public Sector Threat Landscape: Trustwave Threat Intelligence Briefing and Mitigation Strategies

Trustwave SpiderLabs’ 2024 Public Sector Threat Landscape: Trustwave Threat Intelligence Briefing and Mitigation Strategies report details the security issues facing public sector security teams as...

Read More

How to Create the Asset Inventory You Probably Don't Have

This is Part 12 in my ongoing project to cover 30 cybersecurity topics in 30 weekly blog posts. The full series can be found here.

Read More

Guardians of the Gateway: Identity and Access Management Best Practices

This is Part 10 in my ongoing project to cover 30 cybersecurity topics in 30 weekly blog posts. The full series can be found here.

Read More