Scheme from Scratch - Bootstrap v0.22 - Garbage Collection
Garbage collection is an essential part of a production Scheme system. Bootstrap Scheme is not a production Scheme system. It is intended to be just enough of a Scheme interpreter to execute a Scheme compiler once. If your computer has enough memory that it won’t run out during the execution of a compiler than you don’t need to implement a garbage collector in Bootstrap Scheme.
I think that the simplest garbage collection system you could add to the interpreter is a mark-and-sweep system. Mark-and-sweep is easier to implement than many collectors, isn’t foiled by circular references, and doesn’t require hardware-specific knowledge. The garbage collection pauses in a mark-and-sweep system can be noticeably long but that doesn’t matter for a bootstrap interpreter.
The C function alloc_object
in the interpreter currently calls malloc
for every object allocated and doesn’t record anything about what memory has been allocated. The fundamental goal is to replace alloc_object
so a new version of it manipulates a heap of preallocated objects that can be recycled.
This page describes one way to implement a mark-and-sweep algorithm. This is basically the mark-and-sweep algorithm I’ve seen in use in some interpreters.
When the interpreter begins, the C init
function in the model layer is executed. This is the right place to set up a heap of Scheme objects. You can statically allocate or dynamically malloc
enough space for thousands of Scheme objects (i.e. struct object
). Each object needs a new field, char mark
, that must be set to 0
during initialization. Each object also needs a struct object *next
field to be used for tracking which memory is free or active. At the beginning, all memory is free. Link all the heap objects together into a linked list with the next
field. The last object in the list can point to C’s NULL
. Assign the head of this linked list to a C global variable free_list
. There is also a global active_list
that is set to NULL
now and that will hold all objects in use by the program.
When alloc_object
is called, it looks to see if the free_list
has an available object. If the list does have an available object then alloc_object
moves that object from the free_list
to the active_list
and returns that object. If the free_list
is empty then alloc_object
calls a gc
function to run the mark-and-sweep algorithm. When gc
returns, alloc_object
can recheck for available objects on the free_list
and return one if there is one or print an error message and exit
the program.
The gc
function is reasonably simple with its two phases: mark and sweep.
In the mark phase, the garbage collector starts at all the “root” objects (described below) and walks all reachable memory (e.g. pairs can point to other objects) and marks each object. Each reachable object will have its mark field set to 1
. When all reachable objects are marked, then move on to the sweep phase.
In the sweep phase, the garbage collector starts at the top of the active_list
and examines each object in that list. Objects that have mark
set to 0
are moved to the free_list
. (Symbols and strings also need to have their data.symbol.value
field, for example, free
d.) Objects in the active_list
that have mark
set to 1
have their mark
set back to 0
in preparation for the next mark phase.
All of that is pretty easy and nicely contained to a very small area of the C source code. Where everything becomes messy is keeping track of the roots for the mark-and-sweep algorithm. The root objects are the ones where the mark phase starts. These are objects that may not have any other objects pointing to them but should still not be garbage collected. the_global_environment
is an example of a known root object. It is easy to have the sweep phase start at all the known roots.
What isn’t easy is managing the stack roots. Have a look at the following C function, for example,
object *make_if(object *predicate, object *consequent,
object *alternative) {
return cons(if_symbol,
cons(predicate,
cons(consequent,
cons(alternative,
the_empty_list))));
}
There are a total of four calls to cons
. Each call to cons
will call alloc_object
. Each call to alloc_object
can potentially trigger a round of garbage collection. The objects returned by the three inner cons
calls above are not rooted. That is, there is no known garbage collection root, that when followed, will lead to those returned objects. We need a way to protect the returned objects so that an outer call to cons
doesn’t move the previous returned value back to the free_list
.
We need to maintain a global stack_root
list of these otherwise unprotected objects. The mark phase can then use these stack_root
objects as additional roots. In order to do this, the make_if
function needs to be changed to push and pop from the stack_root
list.
object *make_if(object *predicate, object *consequent,
object *alternative) {
object *result;
push_stack_root(&result);
result = cons(alternative, the_empty_list);
result = cons(consequent, result);
result = cons(predicate, result);
result = cons(if_symbol, result);
pop_stack_root();
return result;
}
Ugg. Not pretty, is it? What is neat, and keeps the implementation from being even uglier, is that it is a pointer to the result
pointer that is pushed onto the stack_root
list. That way, anything pointed to by result
is protected. You’ll notice that the result
isn’t protected when it is returned to make_if
’s caller. It is up to that caller to protect the returned object.
If you forget to protect a stack variable you can have dangling pointer. If you forget to pop a stack root you will have a memory leak. The garbage collection code is complex and invades almost all functions. Code with care!
I think you can see why I didn’t cloud the implementation up to now with garbage collection worries. Focusing on the model, reader and evaluation layers was plenty as they were being introduced. I think worrying about garbage collection along the way would have made things intimidating and less fun.
Other types of garbage collectors, with hardware-specific knowledge, allow you to avoid making this mess. The Bohem conservative garbage collector is one such library you could link to your interpreter. If you have a look in the source code of that library you will probably be more frightened by it than you are by the approach I’ve presented.
One more article tomorrow to wrap up things.
Previous article: Standard Library
Next article: Conclusion
Comments
Have something to write? Comment on this article.
Nick,
Yes I think you’ll be able to do quite a few of the later versions like let
, cond
, and
, etc in a single sitting.
Objects in the active_list that have mark set to 1 have their mark set back to 0 in preparation for the next mark phase.
One technique that I have seen in some mark-sweep implementations is instead of visiting all of the active objects again to reset their marks, they just invert the sense of what is marked and what is unmarked.
some C pseudocode:
// in heap initialization:
int current_mark = 1;
// checking marks and setting marks
#define marked_p(o) (o->mark == current_mark)
#define mark_obj(o) (o->mark = current_mark)
// after GC is complete swap marks
current_mark = (current_mark == 1) ? 0 : 1;
I very much like this series of articles about Bootstrap Scheme, as I am faced with similar decisions for my new language Babel-17 v0.2, which is also a functional, dynamically typed language.
I also did not want to worry too much about garbage collection when coding an interpreter for it; and I wanted tail-recursion without growing control context, closures, continuations.
Java seems to be a better implementation choice then than C. It is also portable, and has already important features like garbage collection. What were your reasons to choose C over Java as an implementation language?
Steven,
For this series, I choose C because that is close enough to “from scratch”. I discussed the decision a bit in the introductory article. The idea of implementing from scratch is to avoid missing out on implementing features (or at least thinking about them) that a higher-level implementation language would provide for me.
Have something to write? Comment on this article.
Ok, I have been really strapped for time these last few days, but after an evening at the keyboard, I am up to v0.10 and
if
s!I expect things to start really picking up now that the parser is done!