NetBSD Assembly Programming Tutorial

A sparc64 version is also being prepared and will be added when done

This post describes how to write a simple hello world program in pure assembly on NetBSD/amd64. We will not use (nor link against) libc, nor use gcc to compile it. I will be using GNU as (gas), and therefore the AT&T syntax instead of Intel.

Why assembly?

Wny not? Because it's fun to program in assembly directly. Contrary to a popular belief assembly programs aren't always faster than what optimizing compilers produce. Nevertheless it's good to be able to read assembly, especially when debugging C programs

NetBSD syscalls

In order for the program to do anything, it needs to communicate with the kernel. This is done by the syscall interface. NetBSD syscall numbers are specified in src/sys/sys/syscall.h. Syscall numbers are defined by macros and the comments describe the return value and parameters. For example:

/* syscall: "close" ret: "int" args: "int" */
#define SYS_close 6

informs us that:

The syscall number for close is 6
The syscall returns an integer (4 bytes)
The syscall takes 1 argument which is an int

Syscalls take the arguments in the same way as functions do. In NetBSD/amd64 syscall args are passed in the registers in this order:

rdi, rsi, rdx, r10, r8, r9

(that is listed in a comment in src/sys/arch/x86/x86/syscall.c ). The syscall number is passed in rax (I couldn't find where that is defined)

Syscall return values are in eax

For example if you were to write out "Hello world!" (and we will write a program like this in a moment), you would use the write syscall to ask the kernel to write these bytes to the standard output.

DOS services worked in a similar way to syscalls.

NetBSD ELF secret sauce

NetBSD ELF headers have a special section identifying them as that. If you try to run an ELF that does not have this section, it will fail with Exec format error The as code for the section is below:

.section ".note.netbsd.ident", "a"
.long   2f-1f
.long   4f-3f
.long   1
1:      .asciz "NetBSD"
2:      .p2align 2
3:      .long   199905
4:      .p2align 2

The .s file for the magic section. Note that you can also link against an object file lcoated in /usr/lib/sysident.o that exists for that purpose.

Makefile

The makefile for our program is very simple

all:
	as prog.s -o prog.o
	ld prog.o /usr/lib/sysident.o -o prog	
clean:
	rm prog.o prog

Makefile

INT3 program

Intel enginners designed a one byte opcode 0xcc that invokes interrupt 3 - the debug interrupt. We can use that to see if out Makefile works and our file executes. In order to do that, let's assemble the following file

        .section ".note.netbsd.ident", "a"
        .long   2f-1f
        .long   4f-3f
        .long   1
1:      .asciz "NetBSD"
2:      .p2align 2
3:      .long   199905
4:      .p2align 2
.global _start
.section .text
_start:
int3

This file has got a .text section where the code resides, the magic ident section to tell the kernel that this is a NetBSD executable, and the _start symbol that marks where the executable code starts. Save it as prog.s and assemble with make

You should get the following output:

$ ./prog
Trace/BPT trap (core dumped)

This means the processor successfully executed your program. You can run it in GDB as well:

(gdb) run
Starting program: /home/beastie/prog/int3

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004000c9 in ?? ()
(gdb)

Calling syscalls

The simplest syscall you can call is the exit syscall. It will cause your program to exit with the exit code specified as argument.

Let's recall: The kernel expects the syscall number in rax and the first parameter in rdi. We need to put the values in those registers, and invoke the syscall. We can invoke it in two ways. The old i386 way is via int $0x80. Newer amd64 CPUs have a syscall opcode that does the same thing. From our perspective they both work the same. The latter is recommended.

The exit syscall number is 1

# include that if you aren't linking against sysident.o
.include "magic.s"
.text
.global _start
_start:
    andq $-16, %rsp
    mov $1, %rax
    mov $123, %rdi
    syscall

exit.s

Don't forget to include the directives that create the magic section if you are not linking against sysident.o. I have them saved in magic.s and use the assembler directive to include them.

Assemble and link the executable

$ make
$ ./prog
$ echo $?
123
$

As you can see the program exited with a value of 123. But what is that -16 constant doing there? On some operating systems and CPUs you should make sure the stack is aligned to a certain memory boundary. In this case we should align the stack to a 16 boundary. Still, why are we doing a negative constant? Consider how -16 looks like in binary (two's complement). To find a two's complement value, invert all the bits and add 1, like this:

16 dec = 0x0010, therefore -16 = 0xffef + 1 = 0xfff0

Then if we sign extend that to 64 bits, we get 0xffff ffff ffff fff0. Still, what is the constant for?

If you recall the truth table for the AND function, you will notice that ANDing any bit with a 1 bit leaves it unaltered and ANDing any bit with a 0 bit clears the bit. So in our case, the 4 least significant bits of the stack pointer (rsp register) will be cleared. Since stacks grow downwards (towards the smaller address), the new rsp value will be smaller - therefore higher on the stack - and on a 16-bit boundary - as needed for performance reasons. Also, don't forget the $ to tell the assembler that you mean a constant (more on that later).

Hello World

Ok, now that we have a working syscall let's write a Hello, World program.

To print Hello, World! we need to:

Have the "Hello, World!" string somwhere in memory
Invoke the write syscall to write it out to stdout (file descriptor 1)
Invoke the exit syscall to finish the program (otherwise, it will keep running and crash sooner or later)

Let's go ahead and write the program.

# SYSCALL ARGS
# rdi rsi rdx rcx r8 r9

# include unless linking with sysident.o
.include "magic.s"
.global _start
.section .text
_start:

andq $-16, %rsp
mov $4, %rax
mov $1, %rdi
mov $hello, %rsi
mov hello_len, %dl # Note: does not clear upper bytes. Use movzxb (move zero extend) for that
syscall

mov $1, %rax
xor %rdi, %rdi
syscall

.section .data
hello:
.ascii "Hello, world!\n"
hello_len:
.byte .-hello

prog.s

The beginning is just like in the previous program. But we have a new section: .data

.data , as the name suggests, is the section used for program data. In the section there are two labels. We use them to make the assembler and linker calculate the addresses for us. The hello label is defined as the beginning of a "Hello, world!\n" string. In this case, I have used the .ascii directive that does not null terminate the string. If I wanted to use libc functions like printf, I would have to use the .asciz directive or manually terminate that string (for example with a .byte 0 after it).

Then there is a hello_lenlabel, which marks the address of a byte value. That byte contains the value of .-hello, which the assembler will calculate as the difference of the current address (the dot) and the hello label address. It's easy to work out that this equals to the length of the string

And there's a good reason we store the length as well, since the write syscall takes the following arguments:

/* syscall: "write" ret: "ssize_t" args: "int" "const void *" "size_t" */

that is

the file descriptor (in rdi),
the pointer to the data to write (in rsi),
and the number of bytes to write (in rdx).

In our case, the last argument is the length of the string. Once we load all these to the registers, we can invoke the syscall.

A note about syntax

Notice that hello in the mov instruction is prefixed with a $ sign. That is to inform the assembler we want to load the address behind the label instead of the value in memory at that address.

Suppose we actually wrote mov hello, %rsi instead. Let's see what the register contents would become...

   0x0000000000040010e <+14>:    mov    0x600128,%rsi
(gdb) print/x $rsi
$1 = 0x77202c6f6c6c6548

Hmm, that is not a value I wanted in the register... Let's examine the memory at 0x600128...

(gdb) x/8xb 0x600128
0x600128:       0x48    0x65    0x6c    0x6c    0x6f    0x2c    0x20    0x77

Do the bytes look familiar? If you take an ASCII table you will notice it spells out "Hello, w" which is the first 8 bytes of our string. And the register value spells out "w ,olleH", because x86_64 is little endian.

This is just a sidenote that you should pay attention to syntax quirks like that

And while we are here, it's worth to mention that x86_64 has a special instruction for address maniputation, lea (Load Effective Address). We could as well write:

lea hello, %rsi

and it would do exactly what we expect. In fact if we tried to do a lea with constant (lea $hello, %rsi), as will produce an error.

Note that when loading the hello_len to rdx there is no $, since we are loading the value at that address. I'm also using dl (lower byte) instead of rdx, to avoid loading any garbage data that could happen to be in the next 7 bytes.

Then, there's a next syscall, the exit. Notice that instead of mov $0, %edi I have used xor %edi, %edi. It's one of the tricks that make sense more in assembly than anywhere else. It's a common way of zero-ing a register. And the xor operation takes less bytes to encode.

Finally, we can run the program:

$ ./prog
Hello, World!
$

Back to homepage