Tree transformations: Desugaring Scheme

[article index] [] [@mattmight] [rss]

Desugaring complex language features into core forms simplifies the implementation of compilers, interpreters and static analyzers.

Scheme is a good language for demonstrating desugaring, because it is a complex language with a tiny inner core.

This article looks at how to desugar a small Scheme-like language into a compact core using functional tree transformations.

The desugarings for recursion and for quasiquotation may be of particular interest to those not familiar with how these features are implemented.

Code for the desugarer is available in Racket.

Read below to learn more about desugaring.

A small language

The input language for the desugarer is a small Scheme-like language.

It contains top-level forms, let-binding, recursion, quasiquotation and complex conditionals:

<prog> ::= <top>*

<top> ::= <def> | <exp>

<def> ::= (define (<var> <var>*) <body>)
       |  (define <var> <exp>)

<exp> ::= <var>
       |  <literal>
       |  <prim>
       |  (lambda (<var>*) <body>)
       |  (let ((<var> <exp>)*) <body>)
       |  (letrec ((<var> <exp>)*) <body>) 
       |  (cond (<exp> <exp>)* [(else <exp>)])
       |  (and <exp>*)
       |  (or <exp>*)
       |  (if <exp> <exp> [ <exp> ])
       |  (set! <var> <exp>)
       |  (begin <body>)
       |  (quote <s-exp>)
       |  (quasiquote <qq-exp 1>)

<qq-exp 0> ::= <exp>

<qq-exp n> ::= <symbol>
            |  <literal>
            |  <qq-exp n>*
            |  (quasiquote <qq-exp n+1>)
            |  (unquote <qq-exp n-1>)
            |  (unquote-splicing <qq-exp n-1>)
 
<body> ::= <top>* <exp>

Quotation and quasiquotation have special alternate syntactic forms:

 'e    ==    (quote e)
 `e    ==    (quasiquote e)
 ,e    ==    (unquote e)
 ,@e   ==    (unquote-splicing e)

A core language

The output of the desugarer targets a smaller language:

<prog> ::= <def>* <exp>*

<def> ::= (define <var> <exp>)

<exp> ::= <var>
       |  <literal>
       |  <prim>
       |  (quote <literal>)
       |  (lambda (<var>*) <exp>)
       |  (set! <var> <exp>)
       |  (if <exp> <exp> <exp>)
       |  (begin <exp>*)
       |  (<exp> <exp>*)

Downstream passes will be much simpler.

Writing a tree-transformer

The structure of the input language guides the creation of a tree transformer.

That is, for each kind of term, there is a corresponding function that accepts (and desugars) that kind of term:

(define (desugar-program prog) ...)
(define (desugar-define def) ...)
(define (desugar-exp exp) ...)
(define (desugar-body body) ...)
(define (desugar-quote s-exp) ...)
(define (desugar-qq n qq-exp) ...)

Desugaring the program

At the top level, the desugarer must convert programs of the form:

<prog> ::= <top>*

<top> ::= <def> | <exp>

<def> ::= (define (<var> <var>*) <body>)
       |  (define <var> <exp>)

into programs of the form:

<prog> ::= <def>* <exp>*

<def> ::= (define <var> <exp>)

For uniformity, at first, it will convert all top-level forms into defines with:

; tops-to-defs : top list -> def list
(define (tops-to-defs tops)
  
  (define (top-to-def top)
    (match top
      [`(define (,f ,params ...) . ,body) 
       `(define ,f (λ ,params . ,body))]
    
      [`(define ,v ,exp)
       `(define ,v ,exp)]
    
      [exp
       `(define ,(gensym '_) ,exp)]))
  
  (map top-to-def tops))

Next, it will desugar expressions within defines, using:

; desugar-define : define-term -> exp
(define (desugar-define def)
  (match def
    [`(define ,v ,exp)
     `(define ,v ,(desugar-exp exp))]
    
    [else
     (error (format "cannot desugar: ~s~n" def))]))

Finally, it will partition the list of defines into those that define simple, atomic expressions and those that define complex expressions.

Those that define simple, atomic expressions will be left as is, while those that define complex expressions will have their bodies moved down below as set!s.

That is, if we have:

(define (area x) (* pi 2 x x))
(define pi 3.14)
(display (area 10))

Then, it becomes:

(define area (void))
(define pi 3.14)
(define $t1 (void))
(set! area (lambda (x) (* 2 pi x x)))
(set! $t1 (display (area 10)))

Or, in code:

; desugar : program -> program
(define (desugar-program prog)
  
  (set! prog (tops-to-defs prog))
  (set! prog (map desugar-define prog))
  (set! prog
    (partition-k 
     atomic-define?
     prog
     (λ (atomic complex)
       (define bindings
         (for/list ([c complex])
           (match c
             [`(define ,v ,complex)
              `(,v (void))])))
       
       (define sets
         (for/list ([c complex])
           (match c
             [`(define ,v ,complex)
              `(set! ,v ,complex)])))
       
       (append atomic (list `(let ,bindings ,sets))))))
  
  prog)

where:

; partition-k : ('a -> boolean) 'a list 
;               ('a list 'a list -> 'a list 'a list)
(define (partition-k pred lst k)
  (if (not (pair? lst))
      (k '() '())
      (partition-k pred (cdr lst) (λ (in out)
        (if (pred (car lst))
            (k (cons (car lst) in) out)
            (k in (cons (car lst) out)))))))


; atomic? : term -> boolean
(define (atomic? exp)
  (match exp
    [`(λ . ,_)     #t]
    [(? number?)   #t]
    [(? string?)   #t]
    [(? boolean?)  #t]
    [`(quote . ,_) #t]
    ['(void)       #t]
    [else          #f]))

; atomic-define? : term -> boolean
(define (atomic-define? def)
  (match def
    [`(define ,v ,exp)  (atomic? exp)]
    [else               #f]))

Desugaring expressions

When desugaring an expression, the transformer desugar-exp discriminates on the type of expression; at high level:

; desugar-exp : exp -> exp
(define (desugar-exp exp)
  (match exp
    [(? symbol?)                        exp]
    [`(quote ,s-exp)                    (desugar-quote s-exp)]

    ; binding forms:
    [`(let ((,vs ,es) ...) . ,body)     ...]
    [`(letrec ((,vs ,es) ...) . ,body)  ...]
    [`(λ ,params . ,body)               ...]
    
    ; conditionals:
    [`(cond)                            (void)]
    [`(cond (else ,exp))                ...]
    [`(cond (,test ,exp))               ...]
    [`(cond (,test ,exp) ,rest ...)     ...]
    
    [`(and)                             #t]
    [`(or)                              #f]
    [`(or ,exp)                        ...]
    [`(and ,exp)                       ...]
    [`(or ,exp . ,rest)                ...]
    [`(and ,exp . ,rest)               ...]
    
    [`(if ,test ,exp)                  ...]
    [`(if ,test ,exp1 ,exp2)           ...]
    
    ; mutation:
    [`(set! ,v ,exp)                   ...]
 
    ; quasiquotation:
    [`(quasiquote ,qq-exp)             (desugar-qq 1 qq-exp)]
    
    ; begins:
    [`(begin . ,body)                  (desugar-body body)]
    
    ; atoms:
    [(? atomic?)                       exp]
    
    ; function calls:
    [`(,f . ,args)                     ...]
            
    [else 
     (printf "desugar fail: ~s~n" exp)
     exp]))

The transformer that desugar expressions is guided by the grammar of the input language: a match (or group of matches) for each expression form.

The next few subsections fill in the complex (...) cases in detail.

Desugaring let-binding forms

Let bindings desugar into application and lambda:

    [`(let ((,vs ,es) ...) . ,body)
     `((λ ,vs ,(desugar-body body)) 
       ,@(map desugar-exp es))]

so that:

 (let ((v 30)) (+ v v))

would become:

 ((lambda (v) (+ v v)) 30)

Desugaring letrec

Recursive binding letrec forms desugar with a "lets+set!s" transform:

    
    [`(letrec ((,vs ,es) ...) . ,body)
     (desugar-exp
      `(let ,(for/list ([v vs])
               (list v '(void)))
         ,@(map (λ (v e)
                  `(set! ,v ,e))
                vs es)
         ,@body))]

so that:

 (letrec ((f (lambda (x) (g x)))
          (g (lambda (x) (+ x 1))))
  (f 20))

becomes:

 (let ((f (void))
       (g (void)))
   (set! f (lambda (x) (g x)))
   (set! g (lambda (x) (+ x 1)))
   (f 20))

Transforming lambda forms

For lambda terms, the transformer desugars the body:

    [`(λ ,params . ,body)
     `(λ ,params ,(desugar-body body))]

Desugaring cond

To desugar cond, the transformer matches on all possible cases.

In the simplest case, it's an empty cond:

    [`(cond)     '(void)]

If it's a single else clause, the cond is degenerate:

    [`(cond (else ,exp))
     (desugar-exp exp)]

If it's a single testing clause, the cond becomes an if:

        
    [`(cond (,test ,exp))
     `(if ,(desugar-exp test) 
          ,(desugar-exp exp) 
          (void))]

Finally, if it's a cond with multiple clauses, the first clause becomes an if before desugaring the remaining clauses:

    [`(cond (,test ,exp) ,rest ...)
     `(if ,(desugar-exp test)
          ,(desugar-exp exp)
          ,(desugar-exp `(cond . ,rest)))]

Desugaring and, or

The forms for and and or desugar as expected, but with care, so that the or is equivalent to its first true value and and is equivalent to the last true value.

Without operands, these forms simplify to constants:

    [`(and)   #t]
    [`(or)    #f]

With single operands, they are both degenerate:

    [`(or ,exp)     (desugar-exp exp)]
    [`(and ,exp)    (desugar-exp exp)]

With multiple operands, the or form must capture the first in a temporary:

    [`(or ,exp . ,rest)
     (define $t (gensym 't))
     (desugar-exp 
      `(let ((,$t ,exp))
         (if ,$t ,$t (or . ,rest))))]
    
    [`(and ,exp . ,rest)
     `(if ,(desugar-exp exp)
          ,(desugar-exp `(and . ,rest))
          #f)]

Desugaring if

A one-armed if form becomes a two-armed if where the alternate is (void):

    [`(if ,test ,exp)
     `(if ,(desugar-exp test) ,(desugar-exp exp) (void))]

For regular if forms, it merely passes on the desugaring transformation to all of its sub-expressions:

   
    [`(if ,test ,exp1 ,exp2)
     `(if ,(desugar-exp test) 
          ,(desugar-exp exp1) 
          ,(desugar-exp exp2))]

Transforming set!

As with if, set! forwards on the transformation:

    [`(set! ,v ,exp)
     `(set! ,v ,(desugar-exp exp))]

Transforming function call

Finally, the transformation over function call transforms the function and all of its arguments:

    [`(,f . ,args)  
     `(,(desugar-exp f) ,@(map desugar-exp args))]

Desugaring bodies

To desugar, the body there are three cases:

the body is a single expression, in which case the body form is degenerate;
the body is a sequence of expressions, in which case it is a simple begin;
the body contains definitions, in which case it desugars to letrec:

(define (desugar-body body)
  (match body
    [`(,exp)
     (desugar-exp exp)]
    
    [`(,(and (? not-define?) exps) ...)
     `(begin ,@(map desugar-exp exps))]
    
    [`(,tops ... ,exp)
     (define defs (tops-to-defs tops))
     (desugar-exp (match defs
                    [`((define ,vs ,es) ...)
                     `(letrec ,(map list vs es) ,exp)]))]))

Desugaring quotes

Quotation desugars recursively according to a few rules:

 '(a . b)    =>    (cons 'a 'b)
 '()         =>    (list)
 'symbol     =>    'symbol
 'atom       =>    atom

As code:

; desugar-quote : sexp -> exp
(define (desugar-quote s-exp)
  (cond
    [(pair? s-exp)     `(cons ,(desugar-quote (car s-exp))
                              ,(desugar-quote (cdr s-exp)))]
    [(null? s-exp)     ''()]
    [(number? s-exp)   s-exp]
    [(string? s-exp)   s-exp]
    [(boolean? s-exp)  s-exp]
    [(symbol? s-exp)   `(quote ,s-exp)]
    [else 
     (error (format "strange value in quote: ~s~n" s-exp))]))

Desugaring quasiquotation

If all quasiquotation were level 1--no embedded quasiquotation--then its desugaring would also be described with simple rewrite rules:

 `(a . b)    =>    (cons `a `b)
 `()         =>    (list)
 `symbol     =>    'symbol
 `atom       =>    atom

 `,a         =>    a
 `(,@a b)    =>    (append a `b)

Since we allow embedded quasiquotation, the desugarer tracks quasiquotation depth that with a nesting counter:

; desugar-qq : qqexp -> exp
(define (desugar-qq n qq-exp)
  (match qq-exp
    [(list 'unquote exp)
     (if (= n 1)
         (desugar-exp exp)
         (list 'list ''unquote 
               (desugar-qq (- n 1) exp)))]
    
    [`(quasiquote ,qq-exp)
     `(list 'quasiquote ,(desugar-qq (+ n 1) qq-exp))]
    
    [(cons (list 'unquote-splicing exp) rest)
     (if (= n 1)
         `(append ,exp ,(desugar-qq n rest))
         (cons (list 'unquote-splicing (desugar-qq (- n 1) exp))
               (desugar-qq n rest)))]
    
    [`(,qq-exp1 . ,rest)
     `(cons ,(desugar-qq n qq-exp1)
            ,(desugar-qq n rest))]
       
    [else 
     (desugar-quote qq-exp)]))

Going further

Could we continue desugaring?

Absolutely.

Eliminating let*

Although the input language did not contain let*, it is simple to desugar this into nested lets:

 (let* ([v1 exp1]
        ...
        [vn expn])
   body)
  
 =>

 (let ([v1 exp1])
  (let ([v2 exp2])
   ...
   (let ([vn expn])
    body) 
    ...))

Eliminating begin

It's possible to desugar simple (begin <exp>*) forms into nested lets:

 (begin
   exp₁
   ...
   exp_n)

 =>

 (let* ([$t1   exp₁]
        ...
        [$tn-1 exp_n-1])
  exp_n)

where the $t variables are fresh temporaries.

Eliminating set!

Is it really possible to eliminate set!?

It is possible, but through a global program transformation that threads a store throughout the program.

In general, however, we would not go this far.

The transformations thus far will not necessarily impact efficiency; a store-passing transformation obscures reasoning about the program and will likely cost efficiency for either interpretation or compilation.

How far can we go?

If we wanted to, we could desugar all the way to the simple lambda calculus:

 <exp> ::= <var>
        |  (λ (<var>) <exp>)
        |  (<exp> <exp>)

Code

Racket code for this article is available: desugar.rkt.