Assembly "hello, world" for OS X
Lately I've been spending time reading non-web programming books. Assembly language is something I've wanted to know for a long time primarily because I'm interested in compilers and operating systems.
Jeff Duntemann's Assembly Language Step-by-Step, 2nd ed. seems to be a great introduction. It reads more like a light story introduction to programming, machine architecture and assembly language than a dense reference manual. The book takes such a leisurely pace that the first program (a "hello, world" equivalent for DOS) doesn't come until page 228 after long, interesting discussions of hardware and memory models. The Linux version of "hello, world" doesn't come until page 469 — almost the end of the book!
Duntemann's book covers the NASM assembler for both DOS and Linux programming. I use 32 bit OS X locally which is more like the BSD operating system. There is some conversion required when reading and experimenting compared with the book examples. The main difference between OS X and BSD when compared with Linux is how system calls are made.
Straight Line "hello, world"
This first version of "hello, world" for OS X is a simple straight line program (i.e. no procedures or libraries.) If you need a fast "hello, world" this is the one for you.
; hello.asm - a "hello, world" program using NASM
section .text
global mystart ; make the main function externally visible
mystart:
; 1 print "hello, world"
; 1a prepare the arguments for the system call to write
push dword mylen ; message length
push dword mymsg ; message to write
push dword 1 ; file descriptor value
; 1b make the system call to write
mov eax, 0x4 ; system call number for write
sub esp, 4 ; OS X (and BSD) system calls needs "extra space" on stack
int 0x80 ; make the actual system call
; 1c clean up the stack
add esp, 16 ; 3 args * 4 bytes/arg + 4 bytes extra space = 16 bytes
; 2 exit the program
; 2a prepare the argument for the sys call to exit
push dword 0 ; exit status returned to the operating system
; 2b make the call to sys call to exit
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X (and BSD) system calls needs "extra space" on stack
int 0x80 ; make the system call
; 2c no need to clean up the stack because no code here would executed: already exited
section .data
mymsg db "hello, world", 0xa ; string with a carriage-return
mylen equ $-mymsg ; string length in bytes
Assemble the above source to an object file, hello.o
, in the Mach-O format.
$ nasm -f macho hello.asm
Link the object file to produce the hello
executable. (Calling this step "linking" when there is only one object file is a bit weird.)
$ ld -o hello -e mystart hello.o
Run the executable.
$ ./hello
hello, world
Check the exit status in Bash.
$ echo $?
0
In the above code I've used the prefix "my
" on all the bits and pieces that are not NASM instructions, directions, etc. Using this prefix also shows how any label can be used as the start of execution by the linker.
The hello
executable produced above doesn't use or link to any C runtime libraries (libc or glibc). The above uses the raw system calls provided by the operating system itself.
The "extra space" on the stack before the system calls looks awkward above but makes wrapping the system calls in procedures easier. The reason for this extra space is explained nicely in the FreeBSD Developer's Handbook.
Procedural "hello, world"
The following version of the "hello, world" program uses a couple procedures to wrap the system calls.
; hello.asm - a "hello, world" program using NASM
section .text
global mystart ; make the main function externally visible
; a procedure wrapping the system call to write
mywrite:
mov eax, 0x4 ; system call write code
int 0x80 ; make system call
ret
; a procedure wrapping the system call to exit
myexit:
mov eax, 0x1 ; system call exit code
int 0x80 ; make system call
; no need to return
mystart:
; 1 print "hello, world"
; 1a prepare arguments
push dword mylen ; message length
push dword mymsg ; message to write
push dword 1 ; file descriptor value
; 1b make call
call mywrite
; 1c clean up stack
add esp, 12
; 2 exit the program
; 2a prepare arguments
push dword 0 ; exit code
; 2b make call
call myexit
; 2c no need to clean up because no code here would executed...already exited!
section .data
mymsg db "hello, world", 0xa ; string with a carriage-return
mylen equ $-mymsg ; string length in bytes
Compile and run the above code as in the first example.
Note that the oddness of the "extra space" on the stack has disappeared. Instead of adding extra space manually, it is added automatically as part of the call mywrite
and call myexit
lines when the address of the subsequent instruction is pushed onto the stack. The ret
line pops this address off the stack and the program continues executing at that address.
Library "hello, world"
This example shows how the mywrite
and myexit
procedures can be moved out to a separate library.
sys.asm
; sys.asm - system call wrapper procedures
section .text
; make the library API externally visible
global mywrite
global myexit
mywrite:
mov eax, 0x4 ; sys call write code
int 0x80 ; make system call
ret
myexit:
mov eax, 0x1 ; sys call exit code
int 0x80 ; make system call
hello.asm
; hello.asm - a "hello, world" program using NASM
section .text
; tell the assembler about library functions are used and the linker will resolve them
extern mywrite
extern myexit
global mystart ; make the main function externally visible
mystart: ; write our string to standard output
; 1 print "hello, world"
; 1a prepare arguments
push dword mylen ; message length
push dword mymsg ; message to write
push dword 1 ; file descriptor value
; 1b make call
call mywrite
; 1c clean up stack
add esp, 12 ; 3 args * 4 bytes/arg = 12 bytes
; 2 exit the program
; 2a prepare arguments
push dword 0 ; exit code
; 2b make call
call myexit
; 2c no need to clean up because no code here would executed...already exited!
section .data
mymsg db "hello, world", 0xa ; string with a carriage-return
mylen equ $-mymsg ; string length in bytes
Assemble the above two source files.
$ nasm -f macho sys.asm
$ nasm -f macho hello.asm
Link the object files to produce the hello
executable.
$ ld -o hello -e mystart sys.o hello.o
Run the executable.
$ ./hello
hello, world
[SEGMENT .text]
verses section .text
In Duntemann's book, his examples mark segments using the following primitive assembler directive.
[SEGMENT .text]
The sections 6.0 and 6.3 of NASM documentation explain that using the "user-level" directive is preferred. User-level directives are written without square brackets.
SEGMENT .text
Since the SEGMENT
and SECTION
directives are synonyms and case insensitive, the directive can be written as I've done
section .text
Exiting the Program
The above programs use the exit system call explicitly. In Duntemann's book, he just uses the ret
instruction at the end of his program's main body.
The reason he can do that is Duntemann links his examples with gcc
and that adds the whole libc
to his executables. libc
contains the C start-up routine that is marked by the linker as the starting point for execution. That C start-up routine then calls Duntemann's code. When Duntemann's code finishes and returns control to the C start-up routine, the C start-up routine is the one who uses the exit system call. See Advanced Programming in the UNIX Environment chapter 7 for more information about the C start-up routine.
The above programs are not linked by gcc
and the execution starting point is set to mystart
. Because of these differences, the above programs must call exit explicitly.
The Sacred Registers
Duntemann makes a big fuss about how programs must not modify the "sacred registers": ebx
, ebp
, esp
, esi
and edi
. In some places these registers are named the "callee-save registers". He writes that part of the C calling convention is the caller (the OS in this case) is not responsible for saving these register values. If the callee (the "hello, world" program in this case) is going to use these registers, it is the callee that must save and restore the values in these registers so they are unchanged when control returns to the caller.
Agner Fog has published several manuals including his experimental results regarding calling conventions as calling_conventions.pdf. His document clearly states it is not authoritative and that calling conventions are not well documented and need standardization. Plenty of room to be left feeling uncomfortable.
I cannot find any definitive support for the need to preserve the sacred registers on any of Linux, OS X and BSD. Some OS source code reading may confirm Duntmann's claim and Fog's experimental results.
The above example programs don't use the sacred registers. System calls are used and the system calls might use these registers. I did read somewhere that system calls are guaranteed not to modify any registers except those registers in which the system call returns values. I can't find where I read that and can't say it was an authoritative source. All I can conclude is that if the OS requires a program to obey the C calling convention, then the system calls probably do too and so the above programs would be safe. If the OS doesn't require the program to obey the C calling convention (i.e. the system saves the sacred registers before calling the program) then the above programs don't need to protect against the system calls modifying these registers.
That all said, saving the sacred registers at the beginning and restoring them at end of the main program would not cause any problems and would be conservative. It would only add a little inefficiency if not truly needed. The library "hello, world" example above would be written as follows with Duntemann's boilerplate added.
; hello.asm - a "hello, world" program using NASM
section .text
; tell the assembler about library functions are used and the linker will resolve them
extern mywrite
extern myexit
global mystart ; make the main function externally visible
mystart: ; write our string to standard output
; boilerplate to save sacred registers
push ebp
mov ebp, esp
push ebx
push esi
push edi
; 1 print "hello, world"
; 1a prepare arguments
push dword mylen ; message length
push dword mymsg ; message to write
push dword 1 ; file descriptor value
; 1b make call
call mywrite
; 1c clean up stack
add esp, 12 ; 3 args * 4 bytes/arg = 12 bytes
; boilerplate to restore sacred registers
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
; 2 exit the program
; 2a prepare arguments
push dword 0 ; exit code
; 2b make call
call myexit
; 2c no need to clean up because no code here would executed...already exited!
section .data
mymsg db "hello, world", 0xa ; string with a carriage-return
mylen equ $-mymsg ; string length in bytes
Acknowledgements
Thanks to the folks on the ASM Community Messageboard for helping me understand the exit and calling convention business and the pointer to Fog's PDF document. Thanks also to Duntemann for such an accessible introduction to Assembly Language programming.
Comments
Have something to write? Comment on this article.
Pekka,
Thanks for your comment and information. It is great to see it from an authoritative source.
Wow this is a great article. I’ve been looking for some examples and so far these are the best I’ve found. Also I had some problems being able to link my *.o
file to make en executable file. But no more. Thanks.
Have something to write? Comment on this article.
It's specified in Section 3-11 ("Function Calling Sequence") page 37 of the i386 ABI:
http://www.sco.com/developers/devspecs/abi386-4.pdf
All registers are indeed preserved across system calls. See "system_call" in arch/x86/kernel/entry_32.S of the Linux sources, for example.