Closure conversion
In C, the closest construct to lexically-scoped anonymous functions are function pointers. But, anonymous functions cannot be compiled directly to function pointers, because functions must be declared at the top-level, and this means that they cannot capture local environments. To turn anoymous functions into function pointers, the compiler must remove all of their free variables.
Closure conversion eliminates free variables by making an explicit environment structure, and forcing free variables to be accessed through that structure. Then, every procedure expects to receive that environment structure as a parameter.
Given a lambda term:
(lambda (v1 ... vN) ... x ...)where
x
is a free variable,
a closure conversion algorithm might turn this into something like:
(closure (lambda ($env v1 ... vN) ... (env-get k x $env) ...) (make-env-struct k (x1 x1) ... (xN xN)))During efficient closure conversion, each closure expression gets an id k that is also tied to its lambda term and its environment structure. In C, this
closure
expression might then become:
MakeClosure(lambda_k,alloc_env_k(x1,...xN))and:
struct env_k { // All free variables: Value x ; ... } ; struct env_k* alloc_env_k(Value x, ...) { struct env_k* t = malloc(sizeof(struct env_k)) ; t->x = x ; ... return t; } Value lambda_k(struct env_k* $env, Value v1, ... Value vN) { ... $env->x ... }
Then, wherever a closure is invoked:
(f arg1 ... argN)it compiles into:
(tmp = [compile f], tmp.clo.lam(tmp.clo.env,[compile arg1],...,[compile argN]))
There are many variations on closure representation, but the one above is among the simplest efficient representations. The compiler below differs only slightly, by making environments into first-class values.
Representing values
The compiler below opts for a (space-inefficient) encoding of values as tagged unions instead of a representation that confiscates a few bits of every word for tag bits:
enum Tag { VOID, INT, BOOLEAN, CLOSURE, CELL, ENV } ; typedef union Value (*Lambda)() ; struct Int { enum Tag t ; int value ; } ; struct Boolean { enum Tag t ; unsigned int value ; } ; struct Closure { enum Tag t ; Lambda lam ; void* env ; } ; struct Env { enum Tag t ; void* env ; } ; struct Cell { enum Tag t ; union Value* addr ; } ; union Value { enum Tag t ; struct Int z ; struct Boolean b ; struct Closure clo ; struct Env env ; struct Cell cell ; } ; typedef union Value Value ;Note how the first word of every struct in the union
Value
is a tag, and that the union itself includes the
tag field t
.
This means that, given access to a Value
v
, it is
the reference v.t
is always the tag field.
Creating values
In C, a Value
cannot be declared as an expression
without a name, and yet, to map Scheme code directly into C, we will have create many Value
-returning expressions.
For instance, the Scheme value 3
will have become a C-level expression equivalent to the value of n
after executing:
union Value n ; n.t = INT ; n.z.value = 3 ;
A straightforward solution is to create
static
functions that build the structs and return them.
Because these functions are marked static
, any reasonable
compiler (like GCC) will inline them:
static Value MakeClosure(Lambda lam, Value env) { Value v ; v.clo.t = CLOSURE ; v.clo.lam = lam ; v.clo.env = env.env.env ; return v ; } static Value MakeInt(int n) { Value v ; v.z.t = INT ; v.z.value = n ; return v ; } static Value MakeBoolean(unsigned int b) { Value v ; v.b.t = BOOLEAN ; v.b.value = b ; return v ; } static Value MakePrimitive(Lambda prim) { Value v ; v.clo.t = CLOSURE ; v.clo.lam = prim ; v.clo.env = NULL ; return v ; } static Value MakeEnv(void* env) { Value v ; v.env.t = ENV ; v.env.env = env ; return v ; } static Value NewCell(Value initialValue) { Value v ; v.cell.t = CELL ; v.cell.addr = malloc(sizeof(Value)) ; *v.cell.addr = initialValue ; return v ; }
Code
Resources
- Gambit, a full-featured Scheme-to-C compiler.
- Christian Queinnec's Lisp in Small Pieces: is the Scheme-to-C bible.
- The K&R language standard is particulary helpful when it comes to portably abusing unions, structs and function pointers:
- GCC extensions to C. If you're willing to compromise portability, GCC offers many extensions (like first-class labels) to take the performance of generated code to within a hair's width of writing raw assembly.