An Introduction to x86 Assembly Language

An Introduction to x86 Assembly Language

Basic Instructions and Shellcode

Most desktop or laptop computers in the world run some variant of the x86 processor. Thus, the most common ISA used in computer security is x86. Knowledge of x86 is necessary for understanding how to both reverse engineer and exploit binaries.

Why Assembly?

x86 ISA Overview

Or, why x86 sucks

A Note on Syntax

Brief Note on Segments

Deprecated stuff you can (mostly) ignore

Sections of a Process Image

  • .data
    • Initialized Data
  • .bss
    • Uninitialized Data (set to 0)
  • .text
    • Code
    • Entry Point (_start)
  • The Stack
    • Local variables
  • The Heap
    • Dynamically allocated memory (malloc/new)

section .data:
    message: db 'Hello World!'
    bufsz:   dd 1024
section .bss:
    fname:   resb 255
    num:     resd 1
section .text:
global _start
_start:
    (...)
    call main
    (...)

Memory Layout

+=============+
|    Stack    | ~0xff8e0000
+-------------+
|   Lots of   |
|    Empty    |
|    Space    |
+-------------+
|    Heap     | ~0x993a0000
+-------------+
|   Lots of   |
|    Empty    |
|    Space    |
+-------------+
|    .bss     |
+-------------+
|   .data     |
+-------------+
|   .text     | ~0x08040000
+-------------+

Registers

Register naming example:

         +----------------+--------+--------+
     eax |             ax |   ah   |   al   |
         +----------------+--------+--------+

Standard Instructions

The Basics

Note that at most one argument to an instruction may be a memory argument, and at least one argument must be a register (some exceptions).

mov eax, ebx eax = ebx;
add eax, ebx eax += ebx;
sub eax, ebx eax -= ebx;
inc eax ++eax;
dec eax --eax;
call foo foo();
ret return eax;
push 10h *--esp = 0x10;
pop eax eax = *esp++;

Memory Addressing

Syntax


[Base + Index*ElemSize ± Displacement]

Memory Addressing

Examples

The LEA instruction

Load Effective Address

The Stack

Overview

  • The stack grows DOWNWARD
    • Top of the stack: lowest memory address
  • The esp register points to the top of the stack
    • Adding to esp removes items from the stack
    • Subtracting to esp adds items to the stack

Stack Frames and Calling Conventions


  • Caller pushes args on to stack, right to left
  • Caller executes call instruction
    • call instruction pushes return address on to the stack
  • Callee pushes ebp onto stack, sets ebp to esp
  • Callee then allocates space for local variables
  • Return value is in eax
  • eax, ecx, edx are caller-saved (all others callee-saved)
  • After return, caller responsible for cleaning arguments off the stack

Function Example


int identity(int x) {
    return x;
}

global identity
identity:
    push ebp            ; prologue
    mov ebp, esp        ;
    mov eax, [ebp+8]    ; do actual work
    mov esp, ebp        ; epilogue
    pop ebp             ;
    ret                 ; return

Function Call Example


ebx = identity(ebx);

push ebx         ; push arguments on the stack
call identity    ; call function
add esp, 4       ; clean up passed arguments
mov ebx, eax     ; put return value where we want it

A quick note on ebp

What's the frame pointer for

Tips to Success

A complete program: Hello World


[BITS 32]

section .data:
    msg:    db `Hello, World!\n\0`  ; use backticks for the string
                                    ; note that we need to manually add the \0

section .text:
    extern printf           ; have to declare what functions we use
    global main             ; main is a global symbol (accessible from other files)

main:
    push ebp                ; standard prologue
    mov ebp, esp            ;
    push msg                ; push msg onto the stack (to use as an arg)
    call printf             ; printf(msg)
    add esp, 4              ; clean up the arg we pushed
    mov eax, 0              ; put return code in eax
    mov esp, ebp            ; standard epilogue
    pop ebp                 ;
    ret                     ;

Another Function Example


void vulnerable() {
    char buf[256];
    gets(buf);
}

global vulnerable
vulnerable:
    push ebp            ; prologue
    mov ebp, esp        ;
    sub esp, 256        ; allocate space on stack for buf
    lea eax, [ebp-256]  ; load address of buf
    push eax            ; push args onto stack
    call gets           ; perform function call
    mov esp, ebp        ; epilogue
    pop ebp             ;
    ret                 ; return

Exploit Techniques

Branching

Multiplication/Division (with bigger numbers)

If you actually care...

Another Function Example


global foo
foo:
    push ebp
    mov ebp, esp
    mov eax, [ebp+8]
    test eax, eax
    jnz bar
    inc eax
    jmp baz
bar:
    dec eax
    push eax
    call foo
    pop ecx
    inc ecx
    mul ecx
baz:
    mov esp, ebp
    pop ebp
    ret

int fact(int x) {
    if (x == 0) return 1;
    return x * fact(x - 1);
}

Is assembly faster than C?

System Calls

Hello World, with System Calls

Look Mom, no C library!


[BITS 32]

section .data:
    hello:      db `Hello, World!\n`  ; this time, don't need \0
    helloLen:   dd $-hello            ; string length

section .text:
    global _start
    
_start:                     ; not using C, use _start instead of main
    mov eax, 4              ; write() syscall number
    mov ebx, 1              ; fd (STDOUT_FILENO)
    mov ecx, hello          ; data (pointer) to write
    mov edx, [helloLen]     ; number of bytes to write
    int 0x80                ; call kernel
    mov eax, 1              ; exit() syscall number
    mov ebx, 0              ; return code (0)
    int 0x80                ; call kernel
                            ; NOTE: we cannot return from _start, must exit()

Shellcode Example


[BITS 32]

; Note that we MUST have a valid stack for this to work!

xor ecx, ecx       ; zero ecx
mul ecx            ; edx:eax = eax*ecx, i.e. zeros edx and eax
mov al, 0xb        ; set eax to 0xb, syscall number for execve
push ecx           ; pushes a zero onto the stack (stack is \0\0\0\0)
push '//sh'        ; push '//sh' onto stack (stack is //sh\0\0\0\0)
push '/bin'        ; push '/bin' onto stack (stack is /bin//sh\0\0\0\0)
mov ebx, esp       ; set ebx (arg1: path) to stack pointer ('/bin//sh')
push ecx           ; push another zero (execve needs a NULL at the end)
push ebx           ; push addr of "/bin//sh"
mov ecx, esp       ; set ecx (arg2: argv) to ["/bin//sh", 0]
                   ; edx (arg3: envp) is already NULL from `mul ecx`
int 80h            ; perform system call