Assembly "hello, world" for OS X

Published August 8, 2009 in Assembly, OS X, hello world

Lately I've been spending time reading non-web programming books. Assembly language is something I've wanted to know for a long time primarily because I'm interested in compilers and operating systems.

Jeff Duntemann's Assembly Language Step-by-Step, 2nd ed. seems to be a great introduction. It reads more like a light story introduction to programming, machine architecture and assembly language than a dense reference manual. The book takes such a leisurely pace that the first program (a "hello, world" equivalent for DOS) doesn't come until page 228 after long, interesting discussions of hardware and memory models. The Linux version of "hello, world" doesn't come until page 469 — almost the end of the book!

Duntemann's book covers the NASM assembler for both DOS and Linux programming. I use 32 bit OS X locally which is more like the BSD operating system. There is some conversion required when reading and experimenting compared with the book examples. The main difference between OS X and BSD when compared with Linux is how system calls are made.

Straight Line "hello, world"

This first version of "hello, world" for OS X is a simple straight line program (i.e. no procedures or libraries.) If you need a fast "hello, world" this is the one for you.

; hello.asm - a "hello, world" program using NASM

section .text

global mystart                ; make the main function externally visible

mystart:

; 1 print "hello, world"

    ; 1a prepare the arguments for the system call to write
    push dword mylen          ; message length                           
    push dword mymsg          ; message to write
    push dword 1              ; file descriptor value

    ; 1b make the system call to write
    mov eax, 0x4              ; system call number for write
    sub esp, 4                ; OS X (and BSD) system calls needs "extra space" on stack
    int 0x80                  ; make the actual system call

    ; 1c clean up the stack
    add esp, 16               ; 3 args * 4 bytes/arg + 4 bytes extra space = 16 bytes
    
; 2 exit the program

    ; 2a prepare the argument for the sys call to exit
    push dword 0              ; exit status returned to the operating system

    ; 2b make the call to sys call to exit
    mov eax, 0x1              ; system call number for exit
    sub esp, 4                ; OS X (and BSD) system calls needs "extra space" on stack
    int 0x80                  ; make the system call

    ; 2c no need to clean up the stack because no code here would executed: already exited
    
section .data

  mymsg db "hello, world", 0xa  ; string with a carriage-return
  mylen equ $-mymsg             ; string length in bytes

Assemble the above source to an object file, hello.o, in the Mach-O format.

$ nasm -f macho hello.asm

Link the object file to produce the hello executable. (Calling this step "linking" when there is only one object file is a bit weird.)

$ ld -o hello -e mystart hello.o

Run the executable.

$ ./hello
hello, world

Check the exit status in Bash.

$ echo $?
0

In the above code I've used the prefix "my" on all the bits and pieces that are not NASM instructions, directions, etc. Using this prefix also shows how any label can be used as the start of execution by the linker.

The hello executable produced above doesn't use or link to any C runtime libraries (libc or glibc). The above uses the raw system calls provided by the operating system itself.

The "extra space" on the stack before the system calls looks awkward above but makes wrapping the system calls in procedures easier. The reason for this extra space is explained nicely in the FreeBSD Developer's Handbook.

Procedural "hello, world"

The following version of the "hello, world" program uses a couple procedures to wrap the system calls.

; hello.asm - a "hello, world" program using NASM
    
section .text

global mystart                ; make the main function externally visible

; a procedure wrapping the system call to write
mywrite:
    mov eax, 0x4              ; system call write code
    int 0x80                  ; make system call
    ret

; a procedure wrapping the system call to exit
myexit:
    mov eax, 0x1              ; system call exit code
    int 0x80                  ; make system call
    ; no need to return

mystart:

; 1 print "hello, world"

    ; 1a prepare arguments
    push dword mylen           ; message length                           
    push dword mymsg           ; message to write
    push dword 1               ; file descriptor value
    ; 1b make call
    call mywrite
    ; 1c clean up stack
    add esp, 12
    
; 2 exit the program

    ; 2a prepare arguments
    push dword 0              ; exit code
    ; 2b make call
    call myexit
    ; 2c no need to clean up because no code here would executed...already exited!
    
section .data

  mymsg db "hello, world", 0xa  ; string with a carriage-return
  mylen equ $-mymsg             ; string length in bytes

Compile and run the above code as in the first example.

Note that the oddness of the "extra space" on the stack has disappeared. Instead of adding extra space manually, it is added automatically as part of the call mywrite and call myexit lines when the address of the subsequent instruction is pushed onto the stack. The ret line pops this address off the stack and the program continues executing at that address.

Library "hello, world"

This example shows how the mywrite and myexit procedures can be moved out to a separate library.

sys.asm

; sys.asm - system call wrapper procedures

section .text

; make the library API externally visible
global mywrite
global myexit

mywrite:
    mov eax, 0x4              ; sys call write code
    int 0x80                  ; make system call
    ret

myexit:
    mov eax, 0x1              ; sys call exit code
    int 0x80                  ; make system call

hello.asm

; hello.asm - a "hello, world" program using NASM
    
section .text

; tell the assembler about library functions are used and the linker will resolve them
extern mywrite
extern myexit

global mystart                ; make the main function externally visible

mystart:                      ; write our string to standard output

; 1 print "hello, world"

    ; 1a prepare arguments
    push dword mylen           ; message length                           
    push dword mymsg           ; message to write
    push dword 1               ; file descriptor value
    ; 1b make call
    call mywrite
    ; 1c clean up stack
    add esp, 12                ; 3 args * 4 bytes/arg = 12 bytes
    
; 2 exit the program

    ; 2a prepare arguments
    push dword 0              ; exit code
    ; 2b make call
    call myexit
    ; 2c no need to clean up because no code here would executed...already exited!
    
section .data

  mymsg db "hello, world", 0xa  ; string with a carriage-return
  mylen equ $-mymsg             ; string length in bytes

Assemble the above two source files.

$ nasm -f macho sys.asm
$ nasm -f macho hello.asm

Link the object files to produce the hello executable.

$ ld -o hello -e mystart sys.o hello.o

Run the executable.

$ ./hello
hello, world

`[SEGMENT .text]` verses `section .text`

In Duntemann's book, his examples mark segments using the following primitive assembler directive.

[SEGMENT .text]

The sections 6.0 and 6.3 of NASM documentation explain that using the "user-level" directive is preferred. User-level directives are written without square brackets.

SEGMENT .text

Since the SEGMENT and SECTION directives are synonyms and case insensitive, the directive can be written as I've done

section .text

Exiting the Program

The above programs use the exit system call explicitly. In Duntemann's book, he just uses the ret instruction at the end of his program's main body.

The reason he can do that is Duntemann links his examples with gcc and that adds the whole libc to his executables. libc contains the C start-up routine that is marked by the linker as the starting point for execution. That C start-up routine then calls Duntemann's code. When Duntemann's code finishes and returns control to the C start-up routine, the C start-up routine is the one who uses the exit system call. See Advanced Programming in the UNIX Environment chapter 7 for more information about the C start-up routine.

The above programs are not linked by gcc and the execution starting point is set to mystart. Because of these differences, the above programs must call exit explicitly.

The Sacred Registers

Duntemann makes a big fuss about how programs must not modify the "sacred registers": ebx, ebp, esp, esi and edi. In some places these registers are named the "callee-save registers". He writes that part of the C calling convention is the caller (the OS in this case) is not responsible for saving these register values. If the callee (the "hello, world" program in this case) is going to use these registers, it is the callee that must save and restore the values in these registers so they are unchanged when control returns to the caller.

Agner Fog has published several manuals including his experimental results regarding calling conventions as calling_conventions.pdf. His document clearly states it is not authoritative and that calling conventions are not well documented and need standardization. Plenty of room to be left feeling uncomfortable.

I cannot find any definitive support for the need to preserve the sacred registers on any of Linux, OS X and BSD. Some OS source code reading may confirm Duntmann's claim and Fog's experimental results.

The above example programs don't use the sacred registers. System calls are used and the system calls might use these registers. I did read somewhere that system calls are guaranteed not to modify any registers except those registers in which the system call returns values. I can't find where I read that and can't say it was an authoritative source. All I can conclude is that if the OS requires a program to obey the C calling convention, then the system calls probably do too and so the above programs would be safe. If the OS doesn't require the program to obey the C calling convention (i.e. the system saves the sacred registers before calling the program) then the above programs don't need to protect against the system calls modifying these registers.

That all said, saving the sacred registers at the beginning and restoring them at end of the main program would not cause any problems and would be conservative. It would only add a little inefficiency if not truly needed. The library "hello, world" example above would be written as follows with Duntemann's boilerplate added.

; hello.asm - a "hello, world" program using NASM

section .text

; tell the assembler about library functions are used and the linker will resolve them
extern mywrite
extern myexit

global mystart                ; make the main function externally visible

mystart:                      ; write our string to standard output

; boilerplate to save sacred registers

    push ebp
    mov ebp, esp
    push ebx
    push esi
    push edi

; 1 print "hello, world"

    ; 1a prepare arguments
    push dword mylen           ; message length                           
    push dword mymsg           ; message to write
    push dword 1               ; file descriptor value
    ; 1b make call
    call mywrite
    ; 1c clean up stack
    add esp, 12                ; 3 args * 4 bytes/arg = 12 bytes

; boilerplate to restore sacred registers

    pop edi
    pop esi
    pop ebx
    mov esp, ebp
    pop ebp

; 2 exit the program

    ; 2a prepare arguments
    push dword 0              ; exit code
    ; 2b make call
    call myexit
    ; 2c no need to clean up because no code here would executed...already exited!
    
section .data

  mymsg db "hello, world", 0xa  ; string with a carriage-return
  mylen equ $-mymsg             ; string length in bytes

Acknowledgements

Thanks to the folks on the ASM Community Messageboard for helping me understand the exit and calling convention business and the pointer to Fog's PDF document. Thanks also to Duntemann for such an accessible introduction to Assembly Language programming.

Comments

Have something to write? Comment on this article.

Pekka Enberg January 12, 2010

I cannot find any definitive support for the need to preserve the sacred registers on any of Linux, OS X and BSD. Some OS source code reading may confirm Duntmann's claim and Fog's experimental results.

It's specified in Section 3-11 ("Function Calling Sequence") page 37 of the i386 ABI:

http://www.sco.com/developers/devspecs/abi386-4.pdf

All registers on the Intel386 are global and thus visible to both a calling and a called function. Registers %ebp, %ebx, %edi, %esi, and %esp “belong” to the calling function. In other words, a called function must preserve these registers’ values for its caller.

All registers are indeed preserved across system calls. See "system_call" in arch/x86/kernel/entry_32.S of the Linux sources, for example.

Peter Michaux January 12, 2010

Pekka,

Thanks for your comment and information. It is great to see it from an authoritative source.

Miguel Angel Aquino Acevedo September 13, 2012

Wow this is a great article. I’ve been looking for some examples and so far these are the best I’ve found. Also I had some problems being able to link my *.o file to make en executable file. But no more. Thanks.