This post describes how to write a simple hello world program in pure assembly on NetBSD/amd64. We will not use (nor link against) libc, nor use gcc to compile it. I will be using GNU as (gas), and therefore the AT&T syntax instead of Intel.
Wny not? Because it's fun to program in assembly directly. Contrary to a popular belief assembly programs aren't always faster than what optimizing compilers produce. Nevertheless it's good to be able to read assembly, especially when debugging C programs
In order for the program to do anything, it needs to communicate with the kernel. This is done by the syscall interface. NetBSD syscall numbers are specified in src/sys/sys/syscall.h. Syscall numbers are defined by macros and the comments describe the return value and parameters. For example:
/* syscall: "close" ret: "int" args: "int" */ #define SYS_close 6informs us that:
Syscalls take the arguments in the same way as functions do. In NetBSD/amd64 syscall args are passed in the registers in this order:
rdi, rsi, rdx, r10, r8, r9(that is listed in a comment in src/sys/arch/x86/x86/syscall.c ). The syscall number is passed in rax (I couldn't find where that is defined)
Syscall return values are in eax
For example if you were to write out "Hello world!" (and we will write a program like this in a moment), you would use the write syscall to ask the kernel to write these bytes to the standard output.
DOS services worked in a similar way to syscalls.
NetBSD ELF headers have a special section identifying them as that. If you try to run an ELF that does not have this section, it will fail with Exec format error The as code for the section is below:
.section ".note.netbsd.ident", "a" .long 2f-1f .long 4f-3f .long 1 1: .asciz "NetBSD" 2: .p2align 2 3: .long 199905 4: .p2align 2
The .s file for the magic section. Note that you can also link against an object file lcoated in /usr/lib/sysident.o that exists for that purpose.
The makefile for our program is very simple
all: as prog.s -o prog.o ld prog.o /usr/lib/sysident.o -o prog clean: rm prog.o prog
Intel enginners designed a one byte opcode 0xcc that invokes interrupt 3 - the debug interrupt. We can use that to see if out Makefile works and our file executes. In order to do that, let's assemble the following file
.section ".note.netbsd.ident", "a" .long 2f-1f .long 4f-3f .long 1 1: .asciz "NetBSD" 2: .p2align 2 3: .long 199905 4: .p2align 2 .global _start .section .text _start: int3
This file has got a .text section where the code resides, the magic ident section to tell the kernel that this is a NetBSD executable, and the _start symbol that marks where the executable code starts. Save it as prog.s and assemble with make
You should get the following output:
$ ./prog Trace/BPT trap (core dumped)
This means the processor successfully executed your program. You can run it in GDB as well:
(gdb) run Starting program: /home/beastie/prog/int3 Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000004000c9 in ?? () (gdb)
The simplest syscall you can call is the exit syscall. It will cause your program to exit with the exit code specified as argument.
Let's recall: The kernel expects the syscall number in rax and the first parameter in rdi. We need to put the values in those registers, and invoke the syscall. We can invoke it in two ways. The old i386 way is via int $0x80. Newer amd64 CPUs have a syscall opcode that does the same thing. From our perspective they both work the same. The latter is recommended.
The exit syscall number is 1
# include that if you aren't linking against sysident.o .include "magic.s" .text .global _start _start: andq $-16, %rsp mov $1, %rax mov $123, %rdi syscallexit.s
Don't forget to include the directives that create the magic section if you are not linking against sysident.o. I have them saved in magic.s and use the assembler directive to include them.
Assemble and link the executable
$ make $ ./prog $ echo $? 123 $
As you can see the program exited with a value of 123. But what is that -16 constant doing there? On some operating systems and CPUs you should make sure the stack is aligned to a certain memory boundary. In this case we should align the stack to a 16 boundary. Still, why are we doing a negative constant? Consider how -16 looks like in binary (two's complement). To find a two's complement value, invert all the bits and add 1, like this:
16 dec = 0x0010, therefore -16 = 0xffef + 1 = 0xfff0Then if we sign extend that to 64 bits, we get 0xffff ffff ffff fff0. Still, what is the constant for?
If you recall the truth table for the AND function, you will notice that ANDing any bit with a 1 bit leaves it unaltered and ANDing any bit with a 0 bit clears the bit. So in our case, the 4 least significant bits of the stack pointer (rsp register) will be cleared. Since stacks grow downwards (towards the smaller address), the new rsp value will be smaller - therefore higher on the stack - and on a 16-bit boundary - as needed for performance reasons. Also, don't forget the $ to tell the assembler that you mean a constant (more on that later).
Ok, now that we have a working syscall let's write a Hello, World program.
To print Hello, World! we need to:
# SYSCALL ARGS # rdi rsi rdx rcx r8 r9 # include unless linking with sysident.o .include "magic.s" .global _start .section .text _start: andq $-16, %rsp mov $4, %rax mov $1, %rdi mov $hello, %rsi mov hello_len, %dl # Note: does not clear upper bytes. Use movzxb (move zero extend) for that syscall mov $1, %rax xor %rdi, %rdi syscall .section .data hello: .ascii "Hello, world!\n" hello_len: .byte .-helloprog.s
The beginning is just like in the previous program. But we have a new section: .data
.data , as the name suggests, is the section used for program data. In the section there are two labels. We use them to make the assembler and linker calculate the addresses for us. The hello label is defined as the beginning of a "Hello, world!\n" string. In this case, I have used the .ascii directive that does not null terminate the string. If I wanted to use libc functions like printf, I would have to use the .asciz directive or manually terminate that string (for example with a .byte 0 after it).
Then there is a hello_lenlabel, which marks the address of a byte value. That byte contains the value of .-hello, which the assembler will calculate as the difference of the current address (the dot) and the hello label address. It's easy to work out that this equals to the length of the string
And there's a good reason we store the length as well, since the write syscall takes the following arguments:
/* syscall: "write" ret: "ssize_t" args: "int" "const void *" "size_t" */that is
Notice that hello in the mov instruction is prefixed with a $ sign. That is to inform the assembler we want to load the address behind the label instead of the value in memory at that address.
Suppose we actually wrote mov hello, %rsi instead. Let's see what the register contents would become...
0x0000000000040010e <+14>: mov 0x600128,%rsi (gdb) print/x $rsi $1 = 0x77202c6f6c6c6548
Hmm, that is not a value I wanted in the register... Let's examine the memory at 0x600128...
(gdb) x/8xb 0x600128 0x600128: 0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x77
Do the bytes look familiar? If you take an ASCII table you will notice it spells out "Hello, w" which is the first 8 bytes of our string. And the register value spells out "w ,olleH", because x86_64 is little endian.
This is just a sidenote that you should pay attention to syntax quirks like that
And while we are here, it's worth to mention that x86_64 has a special instruction for address maniputation, lea (Load Effective Address). We could as well write:
lea hello, %rsiand it would do exactly what we expect. In fact if we tried to do a lea with constant (lea $hello, %rsi), as will produce an error.
Note that when loading the hello_len to rdx there is no $, since we are loading the value at that address. I'm also using dl (lower byte) instead of rdx, to avoid loading any garbage data that could happen to be in the next 7 bytes.
Then, there's a next syscall, the exit. Notice that instead of mov $0, %edi I have used xor %edi, %edi. It's one of the tricks that make sense more in assembly than anywhere else. It's a common way of zero-ing a register. And the xor operation takes less bytes to encode.
Finally, we can run the program:
$ ./prog Hello, World! $