A small language
The input language for the desugarer is a small Scheme-like language.
It contains top-level forms, let-binding, recursion, quasiquotation and complex conditionals:
<prog> ::= <top>* <top> ::= <def> | <exp> <def> ::= (define (<var> <var>*) <body>) | (define <var> <exp>) <exp> ::= <var> | <literal> | <prim> | (lambda (<var>*) <body>) | (let ((<var> <exp>)*) <body>) | (letrec ((<var> <exp>)*) <body>) | (cond (<exp> <exp>)* [(else <exp>)]) | (and <exp>*) | (or <exp>*) | (if <exp> <exp> [ <exp> ]) | (set! <var> <exp>) | (begin <body>) | (quote <s-exp>) | (quasiquote <qq-exp 1>) <qq-exp 0> ::= <exp> <qq-exp n> ::= <symbol> | <literal> | <qq-exp n>* | (quasiquote <qq-exp n+1>) | (unquote <qq-exp n-1>) | (unquote-splicing <qq-exp n-1>) <body> ::= <top>* <exp>
Quotation and quasiquotation have special alternate syntactic forms:
'e == (quote e) `e == (quasiquote e) ,e == (unquote e) ,@e == (unquote-splicing e)
A core language
The output of the desugarer targets a smaller language:
<prog> ::= <def>* <exp>* <def> ::= (define <var> <exp>) <exp> ::= <var> | <literal> | <prim> | (quote <literal>) | (lambda (<var>*) <exp>) | (set! <var> <exp>) | (if <exp> <exp> <exp>) | (begin <exp>*) | (<exp> <exp>*)
Downstream passes will be much simpler.
Writing a tree-transformer
The structure of the input language guides the creation of a tree transformer.
That is, for each kind of term, there is a corresponding function that accepts (and desugars) that kind of term:
(define (desugar-program prog) ...) (define (desugar-define def) ...) (define (desugar-exp exp) ...) (define (desugar-body body) ...) (define (desugar-quote s-exp) ...) (define (desugar-qq n qq-exp) ...)
Desugaring the program
At the top level, the desugarer must convert programs of the form:
<prog> ::= <top>* <top> ::= <def> | <exp> <def> ::= (define (<var> <var>*) <body>) | (define <var> <exp>)
into programs of the form:
<prog> ::= <def>* <exp>* <def> ::= (define <var> <exp>)
For uniformity, at first, it will convert all top-level forms into defines with:
; tops-to-defs : top list -> def list (define (tops-to-defs tops) (define (top-to-def top) (match top [`(define (,f ,params ...) . ,body) `(define ,f (λ ,params . ,body))] [`(define ,v ,exp) `(define ,v ,exp)] [exp `(define ,(gensym '_) ,exp)])) (map top-to-def tops))
Next, it will desugar expressions within defines, using:
; desugar-define : define-term -> exp (define (desugar-define def) (match def [`(define ,v ,exp) `(define ,v ,(desugar-exp exp))] [else (error (format "cannot desugar: ~s~n" def))]))
Finally, it will partition the list of defines into those that define simple, atomic expressions and those that define complex expressions.
Those that define simple, atomic expressions will be left as is,
while those that define complex expressions will have their bodies moved down below
as set!
s.
That is, if we have:
(define (area x) (* pi 2 x x)) (define pi 3.14) (display (area 10))
Then, it becomes:
(define area (void)) (define pi 3.14) (define $t1 (void)) (set! area (lambda (x) (* 2 pi x x))) (set! $t1 (display (area 10)))
Or, in code:
; desugar : program -> program (define (desugar-program prog) (set! prog (tops-to-defs prog)) (set! prog (map desugar-define prog)) (set! prog (partition-k atomic-define? prog (λ (atomic complex) (define bindings (for/list ([c complex]) (match c [`(define ,v ,complex) `(,v (void))]))) (define sets (for/list ([c complex]) (match c [`(define ,v ,complex) `(set! ,v ,complex)]))) (append atomic (list `(let ,bindings ,sets)))))) prog)
where:
; partition-k : ('a -> boolean) 'a list ; ('a list 'a list -> 'a list 'a list) (define (partition-k pred lst k) (if (not (pair? lst)) (k '() '()) (partition-k pred (cdr lst) (λ (in out) (if (pred (car lst)) (k (cons (car lst) in) out) (k in (cons (car lst) out))))))) ; atomic? : term -> boolean (define (atomic? exp) (match exp [`(λ . ,_) #t] [(? number?) #t] [(? string?) #t] [(? boolean?) #t] [`(quote . ,_) #t] ['(void) #t] [else #f])) ; atomic-define? : term -> boolean (define (atomic-define? def) (match def [`(define ,v ,exp) (atomic? exp)] [else #f]))
Desugaring expressions
When desugaring an expression,
the transformer desugar-exp
discriminates
on the type of expression; at high level:
; desugar-exp : exp -> exp (define (desugar-exp exp) (match exp [(? symbol?) exp] [`(quote ,s-exp) (desugar-quote s-exp)] ; binding forms: [`(let ((,vs ,es) ...) . ,body) ...] [`(letrec ((,vs ,es) ...) . ,body) ...] [`(λ ,params . ,body) ...] ; conditionals: [`(cond) (void)] [`(cond (else ,exp)) ...] [`(cond (,test ,exp)) ...] [`(cond (,test ,exp) ,rest ...) ...] [`(and) #t] [`(or) #f] [`(or ,exp) ...] [`(and ,exp) ...] [`(or ,exp . ,rest) ...] [`(and ,exp . ,rest) ...] [`(if ,test ,exp) ...] [`(if ,test ,exp1 ,exp2) ...] ; mutation: [`(set! ,v ,exp) ...] ; quasiquotation: [`(quasiquote ,qq-exp) (desugar-qq 1 qq-exp)] ; begins: [`(begin . ,body) (desugar-body body)] ; atoms: [(? atomic?) exp] ; function calls: [`(,f . ,args) ...] [else (printf "desugar fail: ~s~n" exp) exp]))
The transformer that desugar expressions is guided by the grammar of the input language: a match (or group of matches) for each expression form.
The next few subsections fill in the complex (...
) cases in detail.
Desugaring let-binding forms
Let bindings desugar into application and lambda:
[`(let ((,vs ,es) ...) . ,body) `((λ ,vs ,(desugar-body body)) ,@(map desugar-exp es))]
so that:
(let ((v 30)) (+ v v))
would become:
((lambda (v) (+ v v)) 30)
Desugaring letrec
Recursive binding letrec
forms desugar with a "lets+set!s" transform:
[`(letrec ((,vs ,es) ...) . ,body) (desugar-exp `(let ,(for/list ([v vs]) (list v '(void))) ,@(map (λ (v e) `(set! ,v ,e)) vs es) ,@body))]
so that:
(letrec ((f (lambda (x) (g x))) (g (lambda (x) (+ x 1)))) (f 20))
becomes:
(let ((f (void)) (g (void))) (set! f (lambda (x) (g x))) (set! g (lambda (x) (+ x 1))) (f 20))
Transforming lambda forms
For lambda terms, the transformer desugars the body:
[`(λ ,params . ,body) `(λ ,params ,(desugar-body body))]
Desugaring cond
To desugar cond, the transformer matches on all possible cases.
In the simplest case, it's an empty cond:
[`(cond) '(void)]
If it's a single else
clause,
the cond
is degenerate:
[`(cond (else ,exp)) (desugar-exp exp)]
If it's a single testing clause,
the cond
becomes an if
:
[`(cond (,test ,exp)) `(if ,(desugar-exp test) ,(desugar-exp exp) (void))]
Finally, if it's a cond
with multiple clauses, the first clause
becomes an if
before desugaring the remaining clauses:
[`(cond (,test ,exp) ,rest ...) `(if ,(desugar-exp test) ,(desugar-exp exp) ,(desugar-exp `(cond . ,rest)))]
Desugaring and, or
The forms for and
and or
desugar as expected, but with care,
so that the or
is equivalent to its first true
value and and
is equivalent to the last true value.
Without operands, these forms simplify to constants:
[`(and) #t] [`(or) #f]
With single operands, they are both degenerate:
[`(or ,exp) (desugar-exp exp)] [`(and ,exp) (desugar-exp exp)]
With multiple operands,
the or
form must capture
the first in a temporary:
[`(or ,exp . ,rest) (define $t (gensym 't)) (desugar-exp `(let ((,$t ,exp)) (if ,$t ,$t (or . ,rest))))] [`(and ,exp . ,rest) `(if ,(desugar-exp exp) ,(desugar-exp `(and . ,rest)) #f)]
Desugaring if
A one-armed if
form becomes
a two-armed if
where the
alternate is (void)
:
[`(if ,test ,exp) `(if ,(desugar-exp test) ,(desugar-exp exp) (void))]
For regular if
forms, it merely
passes on the desugaring transformation to all of its sub-expressions:
[`(if ,test ,exp1 ,exp2) `(if ,(desugar-exp test) ,(desugar-exp exp1) ,(desugar-exp exp2))]
Transforming set!
As with if
,
set!
forwards on the transformation:
[`(set! ,v ,exp) `(set! ,v ,(desugar-exp exp))]
Transforming function call
Finally, the transformation over function call transforms the function and all of its arguments:
[`(,f . ,args) `(,(desugar-exp f) ,@(map desugar-exp args))]
Desugaring bodies
To desugar, the body there are three cases:
- the body is a single expression, in which case the body form is degenerate;
- the body is a sequence of expressions, in which case it is a simple
begin
; - the body contains definitions, in which case it desugars to
letrec
:
(define (desugar-body body) (match body [`(,exp) (desugar-exp exp)] [`(,(and (? not-define?) exps) ...) `(begin ,@(map desugar-exp exps))] [`(,tops ... ,exp) (define defs (tops-to-defs tops)) (desugar-exp (match defs [`((define ,vs ,es) ...) `(letrec ,(map list vs es) ,exp)]))]))
Desugaring quotes
Quotation desugars recursively according to a few rules:
'(a . b) => (cons 'a 'b) '() => (list) 'symbol => 'symbol 'atom => atom
As code:
; desugar-quote : sexp -> exp (define (desugar-quote s-exp) (cond [(pair? s-exp) `(cons ,(desugar-quote (car s-exp)) ,(desugar-quote (cdr s-exp)))] [(null? s-exp) ''()] [(number? s-exp) s-exp] [(string? s-exp) s-exp] [(boolean? s-exp) s-exp] [(symbol? s-exp) `(quote ,s-exp)] [else (error (format "strange value in quote: ~s~n" s-exp))]))
Desugaring quasiquotation
If all quasiquotation were level 1--no embedded quasiquotation--then its desugaring would also be described with simple rewrite rules:
`(a . b) => (cons `a `b) `() => (list) `symbol => 'symbol `atom => atom `,a => a `(,@a b) => (append a `b)
Since we allow embedded quasiquotation, the desugarer tracks quasiquotation depth that with a nesting counter:
; desugar-qq : qqexp -> exp (define (desugar-qq n qq-exp) (match qq-exp [(list 'unquote exp) (if (= n 1) (desugar-exp exp) (list 'list ''unquote (desugar-qq (- n 1) exp)))] [`(quasiquote ,qq-exp) `(list 'quasiquote ,(desugar-qq (+ n 1) qq-exp))] [(cons (list 'unquote-splicing exp) rest) (if (= n 1) `(append ,exp ,(desugar-qq n rest)) (cons (list 'unquote-splicing (desugar-qq (- n 1) exp)) (desugar-qq n rest)))] [`(,qq-exp1 . ,rest) `(cons ,(desugar-qq n qq-exp1) ,(desugar-qq n rest))] [else (desugar-quote qq-exp)]))
Going further
Could we continue desugaring?
Absolutely.
Eliminating let*
Although the input language did not contain
let*
, it is simple to desugar this into
nested lets
:
(let* ([v1 exp1] ... [vn expn]) body) => (let ([v1 exp1]) (let ([v2 exp2]) ... (let ([vn expn]) body) ...))
Eliminating begin
It's possible to desugar simple (begin <exp>*)
forms
into nested lets:
(begin exp1 ... expn) => (let* ([$t1 exp1] ... [$tn-1 expn-1]) expn)
where the $t
variables are fresh temporaries.
Eliminating set!
Is it really possible to eliminate set!
?
It is possible, but through a global program transformation that threads a store throughout the program.
In general, however, we would not go this far.
The transformations thus far will not necessarily impact efficiency; a store-passing transformation obscures reasoning about the program and will likely cost efficiency for either interpretation or compilation.
How far can we go?
If we wanted to, we could desugar all the way to the simple lambda calculus:
<exp> ::= <var> | (λ (<var>) <exp>) | (<exp> <exp>)
Code
Racket code for this article is available: desugar.rkt.
Further reading
For serious implementers, Christian Quenniac's Lisp in Small Pieces is a great read and reference:
The classic MIT textbook Structure and Interpretation of Computer Programs is built around implementing Scheme: