## A small language

The input language for the desugarer is a small Scheme-like language.

It contains top-level forms, let-binding, recursion, quasiquotation and complex conditionals:

<prog> ::= <top>* <top> ::= <def> | <exp> <def> ::= (define (<var> <var>*) <body>) | (define <var> <exp>) <exp> ::= <var> | <literal> | <prim> | (lambda (<var>*) <body>) | (let ((<var> <exp>)*) <body>) | (letrec ((<var> <exp>)*) <body>) | (cond (<exp> <exp>)* [(else <exp>)]) | (and <exp>*) | (or <exp>*) | (if <exp> <exp> [ <exp> ]) | (set! <var> <exp>) | (begin <body>) | (quote <s-exp>) | (quasiquote <qq-exp 1>) <qq-exp 0> ::= <exp> <qq-expn> ::= <symbol> | <literal> | <qq-expn>* | (quasiquote <qq-expn+1>) | (unquote <qq-expn-1>) | (unquote-splicing <qq-expn-1>) <body> ::= <top>* <exp>

Quotation and quasiquotation have special alternate syntactic forms:

'e== (quotee) `e== (quasiquotee) ,e== (unquotee) ,@e== (unquote-splicinge)

## A core language

The output of the desugarer targets a smaller language:

<prog> ::= <def>* <exp>* <def> ::= (define <var> <exp>) <exp> ::= <var> | <literal> | <prim> | (quote <literal>) | (lambda (<var>*) <exp>) | (set! <var> <exp>) | (if <exp> <exp> <exp>) | (begin <exp>*) | (<exp> <exp>*)

Downstream passes will be much simpler.

## Writing a tree-transformer

The structure of the input language guides the creation of a tree transformer.

That is, for each kind of term, there is a corresponding function that accepts (and desugars) that kind of term:

(define (desugar-program prog) ...) (define (desugar-define def) ...) (define (desugar-exp exp) ...) (define (desugar-body body) ...) (define (desugar-quote s-exp) ...) (define (desugar-qq n qq-exp) ...)

## Desugaring the program

At the top level, the desugarer must convert programs of the form:

<prog> ::= <top>* <top> ::= <def> | <exp> <def> ::= (define (<var> <var>*) <body>) | (define <var> <exp>)

into programs of the form:

<prog> ::= <def>* <exp>* <def> ::= (define <var> <exp>)

For uniformity, at first, it will convert all top-level forms into defines with:

; tops-to-defs : top list -> def list (define (tops-to-defs tops) (define (top-to-def top) (match top [`(define (,f ,params ...) . ,body) `(define ,f (λ ,params . ,body))] [`(define ,v ,exp) `(define ,v ,exp)] [exp `(define ,(gensym '_) ,exp)])) (map top-to-def tops))

Next, it will desugar expressions within defines, using:

; desugar-define : define-term -> exp (define (desugar-define def) (match def [`(define ,v ,exp) `(define ,v ,(desugar-exp exp))] [else (error (format "cannot desugar: ~s~n" def))]))

Finally, it will partition the list of defines into those that define simple, atomic expressions and those that define complex expressions.

Those that define simple, atomic expressions will be left as is,
while those that define complex expressions will have their bodies moved down below
as `set!`

s.

That is, if we have:

(define (area x) (* pi 2 x x)) (define pi 3.14) (display (area 10))

Then, it becomes:

(define area (void)) (define pi 3.14) (define $t1 (void)) (set! area (lambda (x) (* 2 pi x x))) (set! $t1 (display (area 10)))

Or, in code:

; desugar : program -> program (define (desugar-program prog) (set! prog (tops-to-defs prog)) (set! prog (map desugar-define prog)) (set! prog (partition-k atomic-define? prog (λ (atomic complex) (define bindings (for/list ([c complex]) (match c [`(define ,v ,complex) `(,v (void))]))) (define sets (for/list ([c complex]) (match c [`(define ,v ,complex) `(set! ,v ,complex)]))) (append atomic (list `(let ,bindings ,sets)))))) prog)

where:

; partition-k : ('a -> boolean) 'a list ; ('a list 'a list -> 'a list 'a list) (define (partition-k pred lst k) (if (not (pair? lst)) (k '() '()) (partition-k pred (cdr lst) (λ (in out) (if (pred (car lst)) (k (cons (car lst) in) out) (k in (cons (car lst) out))))))) ; atomic? : term -> boolean (define (atomic? exp) (match exp [`(λ . ,_) #t] [(? number?) #t] [(? string?) #t] [(? boolean?) #t] [`(quote . ,_) #t] ['(void) #t] [else #f])) ; atomic-define? : term -> boolean (define (atomic-define? def) (match def [`(define ,v ,exp) (atomic? exp)] [else #f]))

## Desugaring expressions

When desugaring an expression,
the transformer `desugar-exp`

discriminates
on the type of expression; at high level:

; desugar-exp : exp -> exp (define (desugar-exp exp) (match exp [(? symbol?) exp] [`(quote ,s-exp) (desugar-quote s-exp)] ; binding forms: [`(let ((,vs ,es) ...) . ,body) ...] [`(letrec ((,vs ,es) ...) . ,body) ...] [`(λ ,params . ,body) ...] ; conditionals: [`(cond) (void)] [`(cond (else ,exp)) ...] [`(cond (,test ,exp)) ...] [`(cond (,test ,exp) ,rest ...) ...] [`(and) #t] [`(or) #f] [`(or ,exp) ...] [`(and ,exp) ...] [`(or ,exp . ,rest) ...] [`(and ,exp . ,rest) ...] [`(if ,test ,exp) ...] [`(if ,test ,exp1 ,exp2) ...] ; mutation: [`(set! ,v ,exp) ...] ; quasiquotation: [`(quasiquote ,qq-exp) (desugar-qq 1 qq-exp)] ; begins: [`(begin . ,body) (desugar-body body)] ; atoms: [(? atomic?) exp] ; function calls: [`(,f . ,args) ...] [else (printf "desugar fail: ~s~n" exp) exp]))

The transformer that desugar expressions is guided by the grammar of the input language: a match (or group of matches) for each expression form.

The next few subsections fill in the complex (`...`

) cases in detail.

### Desugaring let-binding forms

Let bindings desugar into application and lambda:

[`(let ((,vs ,es) ...) . ,body) `((λ ,vs ,(desugar-body body)) ,@(map desugar-exp es))]

so that:

(let ((v 30)) (+ v v))

would become:

((lambda (v) (+ v v)) 30)

### Desugaring letrec

Recursive binding `letrec`

forms desugar with a "lets+set!s" transform:

[`(letrec ((,vs ,es) ...) . ,body) (desugar-exp `(let ,(for/list ([v vs]) (list v '(void))) ,@(map (λ (v e) `(set! ,v ,e)) vs es) ,@body))]

so that:

(letrec ((f (lambda (x) (g x))) (g (lambda (x) (+ x 1)))) (f 20))

becomes:

(let ((f (void)) (g (void))) (set! f (lambda (x) (g x))) (set! g (lambda (x) (+ x 1))) (f 20))

### Transforming lambda forms

For lambda terms, the transformer desugars the body:

[`(λ ,params . ,body) `(λ ,params ,(desugar-body body))]

### Desugaring cond

To desugar cond, the transformer matches on all possible cases.

In the simplest case, it's an empty cond:

[`(cond) '(void)]

If it's a single `else`

clause,
the `cond`

is degenerate:

[`(cond (else ,exp)) (desugar-exp exp)]

If it's a single testing clause,
the `cond`

becomes an `if`

:

[`(cond (,test ,exp)) `(if ,(desugar-exp test) ,(desugar-exp exp) (void))]

Finally, if it's a `cond`

with multiple clauses, the first clause
becomes an `if`

before desugaring the remaining clauses:

[`(cond (,test ,exp) ,rest ...) `(if ,(desugar-exp test) ,(desugar-exp exp) ,(desugar-exp `(cond . ,rest)))]

### Desugaring and, or

The forms for `and`

and `or`

desugar as expected, but with care,
so that the `or`

is equivalent to its first true
value and `and`

is equivalent to the last true value.

Without operands, these forms simplify to constants:

[`(and) #t] [`(or) #f]

With single operands, they are both degenerate:

[`(or ,exp) (desugar-exp exp)] [`(and ,exp) (desugar-exp exp)]

With multiple operands,
the `or`

form must capture
the first in a temporary:

[`(or ,exp . ,rest) (define $t (gensym 't)) (desugar-exp `(let ((,$t ,exp)) (if ,$t ,$t (or . ,rest))))] [`(and ,exp . ,rest) `(if ,(desugar-exp exp) ,(desugar-exp `(and . ,rest)) #f)]

### Desugaring if

A one-armed `if`

form becomes
a two-armed `if`

where the
alternate is `(void)`

:

[`(if ,test ,exp) `(if ,(desugar-exp test) ,(desugar-exp exp) (void))]

For regular `if`

forms, it merely
passes on the desugaring transformation to all of its sub-expressions:

[`(if ,test ,exp1 ,exp2) `(if ,(desugar-exp test) ,(desugar-exp exp1) ,(desugar-exp exp2))]

### Transforming set!

As with `if`

,
`set!`

forwards on the transformation:

[`(set! ,v ,exp) `(set! ,v ,(desugar-exp exp))]

### Transforming function call

Finally, the transformation over function call transforms the function and all of its arguments:

[`(,f . ,args) `(,(desugar-exp f) ,@(map desugar-exp args))]

## Desugaring bodies

To desugar, the body there are three cases:

- the body is a single expression, in which case the body form is degenerate;
- the body is a sequence of expressions, in which case it is a simple
`begin`

; - the body contains definitions, in which case it desugars to
`letrec`

:

(define (desugar-body body) (match body [`(,exp) (desugar-exp exp)] [`(,(and (? not-define?) exps) ...) `(begin ,@(map desugar-exp exps))] [`(,tops ... ,exp) (define defs (tops-to-defs tops)) (desugar-exp (match defs [`((define ,vs ,es) ...) `(letrec ,(map list vs es) ,exp)]))]))

## Desugaring quotes

Quotation desugars recursively according to a few rules:

'(a.b) => (cons 'a'b) '() => (list) 'symbol=> 'symbol'atom=>atom

As code:

; desugar-quote : sexp -> exp (define (desugar-quote s-exp) (cond [(pair? s-exp) `(cons ,(desugar-quote (car s-exp)) ,(desugar-quote (cdr s-exp)))] [(null? s-exp) ''()] [(number? s-exp) s-exp] [(string? s-exp) s-exp] [(boolean? s-exp) s-exp] [(symbol? s-exp) `(quote ,s-exp)] [else (error (format "strange value in quote: ~s~n" s-exp))]))

## Desugaring quasiquotation

If all quasiquotation were level 1--no embedded quasiquotation--then its desugaring would also be described with simple rewrite rules:

`(a.b) => (cons `a`b) `() => (list) `symbol=> 'symbol`atom=>atom`,a=>a`(,@ab) => (appenda`b)

Since we allow embedded quasiquotation, the desugarer tracks quasiquotation depth that with a nesting counter:

; desugar-qq : qqexp -> exp (define (desugar-qq n qq-exp) (match qq-exp [(list 'unquote exp) (if (= n 1) (desugar-exp exp) (list 'list ''unquote (desugar-qq (- n 1) exp)))] [`(quasiquote ,qq-exp) `(list 'quasiquote ,(desugar-qq (+ n 1) qq-exp))] [(cons (list 'unquote-splicing exp) rest) (if (= n 1) `(append ,exp ,(desugar-qq n rest)) (cons (list 'unquote-splicing (desugar-qq (- n 1) exp)) (desugar-qq n rest)))] [`(,qq-exp1 . ,rest) `(cons ,(desugar-qq n qq-exp1) ,(desugar-qq n rest))] [else (desugar-quote qq-exp)]))

## Going further

Could we continue desugaring?

Absolutely.

### Eliminating let*

Although the input language did not contain
`let*`

, it is simple to desugar this into
nested `lets`

:

(let* ([v1exp1] ... [vnexpn])body) => (let ([v1exp1]) (let ([v2exp2]) ... (let ([vnexpn])body) ...))

### Eliminating begin

It's possible to desugar simple `(begin <exp>*)`

forms
into nested lets:

(beginexp_{1}...exp_{n}) => (let* ([$t1exp_{1}] ... [$tn-1exp_{n-1}])exp_{n})

where the `$t`

variables are fresh temporaries.

### Eliminating set!

Is it really possible to eliminate `set!`

?

It is possible, but through a global program transformation that threads a store throughout the program.

In general, however, we would not go this far.

The transformations thus far will not necessarily impact efficiency; a store-passing transformation obscures reasoning about the program and will likely cost efficiency for either interpretation or compilation.

### How far can we go?

If we wanted to, we could desugar all the way to the simple lambda calculus:

<exp> ::= <var> | (λ (<var>) <exp>) | (<exp> <exp>)

## Code

Racket code for this article is available: desugar.rkt.

## Further reading

For serious implementers, Christian Quenniac's Lisp in Small Pieces is a great read and reference:

The classic MIT textbook Structure and Interpretation of Computer Programs is built around implementing Scheme: