Scheme from Scratch - Introduction

Published January 1, 2010 in Scheme

The design and implementation of programming languages fascinates me. A programming language enables a text file to both accurately communicate to another person a problem’s solution and control a computer’s execution. The influence a language’s feature set has on the way a programmer thinks about problems and their solutions is an endlessly captivating idea. The world of programming languages is amazing.

I’m particularly interested in implementing higher-level languages in lower-level languages. I think I was attracted to Math in school because the construction of new ideas based on a small set of axioms seems like good, clear thinking. Building up to a high-level programming language from below has the same appeal for me.

This article may be the first of many recording the scratching of an itch that apparently needs scratching. I cannot seem to ignore it even if I try. A book on Scheme or implementing interpreters or compilers always seems to find its way into my hands. I want to implement Scheme from Scratch: an interpreter, a compiler, a JIT, etc. I want to see how it all happens. If you do too then you may enjoy following along.

Prerequisites

This exploration is not for novice programmers. It is a tutorial introduction for novice language implementor and enthusiasts from the perspective of a novice language implementor and enthusiast.

You have experience programming on a UNIX or UNIX-like system. Any of Linux, BSD, or Mac OS X is fine. You have used common developer tools like gcc and make. Your copy of K&R 2e is a little dog eared but cherished. You might even have a copy of APUE 2e.

You have some prior exposure to Intel 386 Assembly language. You don’t need to be an Assembly master. Just some general knowledge about registers, the stack and heap, and some basic instructions. Jeff Duntemann’s Assembly Language Step By Step , now in its third edition, is a light and enjoyable introduction providing a sufficient level of Assembly knowledge.

If someone even mentions SICP, your head fills with thoughts of master magicians casting alluring spells. SICP 2e has spent a lot of time on your bedside table and you’ve done many of the exercises when you really should have been doing other things (like I should be doing now.) SICP showed you the implementation of Scheme in Scheme. This chicken and egg paradox bothers you and you wonder “How do I implement Scheme from Scratch?”

“Scheme”

There are many versions of “Scheme”. The Original Lambda Papers show various ancestors. I’ve seen threads mentioned in those papers. SICP implements several versions of Scheme. The main language used in SICP doesn’t have macros. SICP shows the implementation of a lazy Scheme. There is the official IEEE Scheme standard and the de facto RnRS standards: R4RS, R5RS, R6RS.

It isn’t totally clear what makes a Scheme “Scheme”. I’m not going to worry too much about exactly what a language must have to be enough of a Scheme to earn the name. I know it needs at least s-expressions; garbage collection; symbols; first-class, lexically scoped procedures; and proper tail calls.

“from Scratch”

What does it mean to implement “from Scratch”? Upon what foundation will we build Scheme? Do we have a working physical computer, an operating system, an assembler, a linker, a C compiler, a Scheme interpreter or compiler, another high-level language?

At least to me, “from Scratch” implies building up towards Scheme from some lower-level. If a system already has another high-level language then, philosophically, why not just write programs in that language? We could, after all. So implementing Scheme with the assistance of another high-level language is out.

Does the system already have Scheme? No. If it did, we’d use it and you wouldn’t need to implement Scheme. (Or if the system does have Scheme, at least we’d pretend it didn’t so we could enjoy implementing it ourselves.)

I will assume we have working computer with a 32 bit Intel 386-style processor, UNIX-like operating system with an assembler and linker. These are all very interesting low-level details but building up towards a Scheme system from a level where one of these is missing is out of scope for this discussion.

Perhaps the grey area is whether or not the system has a C compiler or not and if we want to use that to build up to Scheme. All UNIX-like systems have a C compiler. The operating system and tools are probably written mostly in C. Many high-level languages, like Scheme, are implemented in C. Working in C is definitely working at a lower-level then working in Scheme. A testament to the low-level of C is that C is often referred to as “portable Assembly”. Given these things we can take a C compiler, gcc, as one of our available tools.

Scheme from Scratch Implementations

There are many possible ways to implement Scheme from Scratch:

Bootstrap Scheme: A quick and dirty Scheme interpreter implemented in C. This should be less than 1000 lines to implement. Not many features and slow execution but enough of an interpreter to bootstrap us into the world of Scheme.
Compiling Scheme: A Scheme to i386 Assembly compiler written in Scheme. First executed with Bootstrap Scheme but later self compiling. See An Incremental Approach to Compiler Construction.
Compile-to-C Scheme: A Scheme to C compiler written in Scheme. Provides great portability because C is the intermediate language on the way to machine code.
Byte Code VM Scheme: An embeddable Scheme virtual machine based on a byte-code compiler and interpreter. Written in ANSI C and very portable. See Chibi Scheme, for example.
JIT Engine Scheme: An embeddable Scheme engine based on a just-in-time compiler for i386 and with sophisticated garbage collection. I’m not sure if this will be written in C or Compiling Scheme.
Platform Scheme: Compile Scheme to target a platform like JVM, .NET’s CLR, Parrot, or LLVM. By targeting an existing platform, a lot of education and fun would be missed. It certainly wouldn’t count as “from Scratch”. It is necessary to write the target platform also.

You can see that it is a big itch. I don’t know how long it will take to scratch or what will come of it all but will surely be fun implementing Scheme from Scratch.

Next article: Integers

Comments

Have something to write? Comment on this article.

Matt Might January 1, 2010

I think this is a great exercise! I teach a few compilers courses based on Scheme, and I thought you might be interested in some of the relevant blog posts I wrote for my students:

Peter Michaux January 1, 2010

Matt,

Thanks for the links and I’ll certainly be reading your pages.

kbob January 1, 2010

Best of luck in your endeavor. Around September, 2008, I started my own “Scheme from scratch” project. Right now, I’m procrastinating writing the ’blog entry that shuts the project down. You can read a little about it here.

Cyber ED January 2, 2010

I have a similar itch, but with Lisp

I think that there is a somewhat different approach that could be viable. My thoughts:

Use an existing Lisp/Scheme to bootstrap with.
Use that Lisp/Scheme to generate executable code, look at tinyCC for how to generate code in a flash.
Use Lisp/Scheme to generate the executable file, ELF is well documented.
Take a look at PicoLisp for their encoding of lists, etc.
Implement a built-in web server so that the GUI is any browser, shades of NewLisp here.

IMHO, one of the failures of the many Lisps is that the GUI is a huge stumbling block. Using generated JavaScript / HTML is the “modern” way.

When you start on a *NIX platform you can avoid the pain imposed by more proprietary platforms (think Mac OS/X and Windoze) porting to those platforms would only be considered once the *NIX version is reasonably mature.

Cyber ED January 2, 2010

You may wish to take a look at ParenScript. It’s a cool way of writing Lisp and getting JavaScript generated for you. In my mind, it could be a component of implementing the GUI for Scheme as in your other post.

Cheers !

Joe January 2, 2010

Check out Lisp500, a lisp in 500 lines of C at http://www.modeemi.fi/~chery/lisp500/

Also take a look at Picol, a TCL interpreter in 500 lines of C at http://antirez.com/page/picol.html

Kerkeslager January 12, 2010

A note: you mention “Compiling Scheme”, “Compile-to-C Scheme”, and then “Platform Scheme”. This organization isn’t wrong per-se, but if you think about it a bit differently, these could be seen as a group of related projects. Namely:

Front-end: Scheme to abstract syntax tree (AST).
Back-end: AST to i386 assembly.
Back-end: AST to C.
Back-end: AST to JVM Bytecode.
Back-end: AST to .NET CLR.
Back-end: AST to Parrot Bytecode.
Back-end: AST to LLVM IR.

The front-end need not be different for these examples.a

Peter Michaux January 12, 2010

Kerkeslager,

And wouldn’t that be a magnificent project. :-)

Anthony Fairchild January 12, 2010

Thanks for blogging about this, it is an interest of mine as well. I recently wrote a self-applicable Scheme to C# compiler for fun called Lunula.

So far it is just a compiler and it produces really ugly C# code. The evaluator and REPL is a work in progress. I wrote the original code in PLT Scheme until it could compile itself. Since then all of the development has been done by bootstrapping the new code with the previous version of the compiler.

Good luck and have fun!

Mark Colburn January 14, 2010

Sounds like fun.

A Scheme to AST frontend should be relatively simple since an AST for Scheme is really just a tokenization/tranfsormation of the source: parenthesis separate each node; each atom within the parenthesis is a leaf; each list is a child node; recurse. The only thing left is to annotate the nodes with appropriate metadata (token type, for example) and you are done, I think.

Parsing should be trivial as the language semantics are simple. Numbers, strings, symbols, some punctuation, and comments. Should be easy to code a naive version that supports only strings, integers, and symbols, then goes on to add reals, arbitrary-length integers, comments, etc. The first version could be tested through test cases that compare the generated AST with a set of hand-build reference trees, or perhaps by “reprinting” the AST and making sure it matches the source.

Kerkeslager January 24, 2010

It definitely would be a very cool project.

Charles Lloyd March 21, 2010

For reason similar to Peter’s here, I built my own Scheme in Java. My goal’s were to learn how Scheme works (I still have a long way to go) and try to code up the tightest modular implementation I could with only modest concern for performance.

If you would like to see my swag at this, visit this link:

http://homepage.mac.com/charlesclloyd/Scheme.java.html

If you follow everything Peter is doing in this series, you should have no problem grok’ing what I am up to. My email address is in that file, so if you’d like more info or have questions, take a look in there.

John October 9, 2010

Hi Guys, I had a quick question?!

has anyone implemented an interpreter for untyped λ-calculus in any of these programming languages:

Haskell
Java or C++
Python
OR any other languages?

The following abstract grammar for untyped λ-expressions is needed to be used

var ::= letter ( letter | _ | digit | ' )*
term ::= var -- Variable
| term term -- Application
| \ var -> term -- Abstraction

and what would your reduction strategies be?

I would appreciate any hints or any program that can help me do this with all those languages.

Cheers.

bill November 29, 2010

last time i needed a lispish embedded language, I worked from SIOD http://en.wikipedia.org/wiki/SIOD Scheme in One Defun. a bit more than 90 lines of C, but complete enough.

Vipper October 16, 2012

Im working on lisp from scratch too, its for an OS from scratch tough.

My plan is to write a bootstrapping implementation in x86 ASM with GAS and GNU ld, then write an assembler in said language. Afterwards make a lisp->ASM compiler and rewrite the interpreter in lisp and use it. However i plan to use it to write an entire OS in it from scratch, with the exception that the kernel, i will be using standard linux for that. I already have a working kernel build and a (nearly) empty file system on a usb disk i can boot with (or use on qemu) that i use on laptop for testing. This should be also useful for tinkering with linux kernel and learn by experimentation about it.

This whole thing made me have the same interest in boostrapping/metacircular systems like Peter Michaux has.

I will definitely enjoy reading this site. Best of luck to anyone doing similar stuff.

Have something to write? Comment on this article.