A simple format
Lisp-like programming languages employ S-Expressions for syntax.
In a synergistic twist, the universal run-time data-type in Lisps is also the S-Expression (or, rather, the run-time data structures that encode them).
The total unification of code and data is what makes Lisp macros unique among all programming languages.
S-Expressions (Symbolic Expressions) encode tree-like data with parentheses to denote grouping.
Minimalist S-Expressions
The grammar for the simplified dialect of S-Expressions in this post is:
<file> ::= <s-exp-list> <s-exp> ::= <atom> | '(' <s-exp-list> ')' <s-exp-list> ::= <sexp> <s-exp-list> | <atom> ::= <symbol> | <integer> | #t | #f
Comments begin with semicolon (;
) and run to the end of the line.
Examples
As data, S-Expressions look like:
(users ((uid 1) (name root) (gid 1)) ((uid 108) (name matt) (gid 108)) ((uid 109) (name ralf) (gid 109)))
As code, S-Expressions look like:
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1)))))
Representing S-Expressions
We'll represent S-Expressions with the type SExp
:
abstract class SExp { /** Returns a true S-Exp list as Scala list of S-Exps. */ def toList : List[SExp] /** Returns true if this is a true S-Exp list; false otherwise. */ def isList : Boolean }
Sub-classes represent the core forms as a tree:
case class SInt(val value : Int) extends SExp { ... } case class SSymbol(val value : String) extends SExp { ...} case class STrue() extends SExp { ... } case class SFalse() extends SExp { ... } case class SCons(val car : SExp, val cdr : SExp) extends SExp { ... } case class SNil() extends SExp { ... }
Lexing S-Expressions
The class SExpLexer
tokenizes a stream of characters:
class SExpLexer(var input : Stream[Char]) { // External interface to lexer: def peek() : SExpToken def next() : SExpToken def eatLPAR() def eatRPAR() // Internal lexical analysis state: private var state : SExpLexerState private trait SExpLexerState { def process(c : Char) : SExpLexerState def processEOF() : SExpLexerState } ... }
Tokens themselves have a simple representation:
private trait SExpToken private case object LPAR extends SExpToken private case object RPAR extends SExpToken private case object EOS extends SExpToken private case class INT(value : Int) extends SExpToken private case class HASH(value : String) extends SExpToken private case class ID(value : String) extends SExpToken
The lexer has six internal states:
private case object DONE extends SExpLexerState private case object INCOMMENT extends SExpLexerState private case class INID(buf : List[Char]) extends SExpLexerState private case class INHASH(buf : List[Char]) extends SExpLexerState private case class INNUM(buf : List[Char]) extends SExpLexerState private case object INWHITESPACE extends SExpLexerState
For each kind of state, the methods process(Char)
and processEOF()
handle the next input, return
the new state and may emit zero or more tokens.
The parser interfaces with the lexer by invoking
peek()
, next()
and eatToken()
.
Parsing with recursive descent
A recursive descent parser has a parsing function for each grammar rule, and it uses look-ahead to craft procedures that follow the natural structure of these rules:
class SExpParser(private val input : Stream[Char]) { private val lex = new SExpLexer(input) // <file> ::= <s-exp-list> def nextFile () : List[SExp] = lex.peek() match { case EOS => List.empty case _ => { val head = nextSExp() val tail = nextFile() head :: tail } } // <s-exp> ::= <atom> // | '(' <s-exp-list> ')' def nextSExp () : SExp = lex.peek() match { case EOS => throw ParseException("expected s-exp; got end of input") case LPAR => { lex.eatLPAR() val sexp = nextSExpList() lex.eatRPAR() sexp } case INT(value) => { lex.next() ; SInt(value) } case ID(value) => { lex.next() ; SSymbol(value) } case HASH("t") => { lex.next() ; STrue() } case HASH("f") => { lex.next() ; SFalse() } } // <s-exp-list> ::= <sexp> <s-exp-list> // | private def nextSExpList () : SExp = lex.peek() match { case RPAR => SNil() case _ => { val head = nextSExp() val tail = nextSExpList() SCons(head,tail) } } }
From S-Expressions to syntax trees
Once the S-Expressions have been read, a straightforward tree walk can convert them into an internal data structure.
For instance, suppose we want to parse a small, Scheme-like language.
First, we render the grammar as an abstract syntax tree in Scala:
abstract class Exp /* Core lambda calculus forms. */ case class RefExp(val id : String) extends Exp case class LambdaExp(val params : List[String], val body : Exp) extends Exp case class AppExp(val fun : Exp, args : List[Exp]) extends Exp /* Scheme forms. */ // Conditionals: case class BoolExp(val value : Boolean) extends Exp case class IfExp(val cond : Exp, val ifTrue : Exp, val ifFalse : Exp) extends Exp case class AndExp(val cond1 : Exp, val cond2 : Exp) extends Exp case class OrExp(val cond1 : Exp, val cond2 : Exp) extends Exp // Numerics: case class IntExp(val value : Int) extends Exp case class ZeroPExp(val test : Exp) extends Exp case class SubExp(val exp1 : Exp, val exp2 : Exp) extends Exp case class EqExp(val exp1 : Exp, val exp2 : Exp) extends Exp case class PlusExp(val exp1 : Exp, val exp2 : Exp) extends Exp case class TimesExp(val exp1 : Exp, val exp2 : Exp) extends Exp // Binding and recursion: case class LetExp(val vars : List[String], val exps : List[Exp], val body : Exp) extends Exp case class LetRecExp(val fun : String, val lambda : Exp, val body : Exp) extends Exp // Lists: case class ConsExp(val car : Exp, val cdr : Exp) extends Exp case class CarExp(val arg : Exp) extends Exp case class CdrExp(val arg : Exp) extends Exp case class PairPExp(val arg : Exp) extends Exp case class NullPExp(val arg : Exp) extends Exp case class NullExp() extends Exp
And, a companion object the procedure
from
,
a tree-walker that extracts
a Scheme expression from an S-Expression:
object Exp { def from(sexp : SExp) : Exp = { sexp match { // References: case SSymbol(id) => RefExp(id) // Lambda terms: case SList(SSymbol("lambda"),params,body) => { val vars = params.toList map { case SSymbol(id) => id } LambdaExp(vars,from(body)) } // Conditionals: case STrue() => BoolExp(true) case SFalse() => BoolExp(false) case SList(SSymbol("if"),cond,ifTrue,ifFalse) => IfExp(from(cond), from(ifTrue), from(ifFalse)) case SList(SSymbol("and"),a,b) => AndExp(from(a),from(b)) case SList(SSymbol("or"),a,b) => OrExp(from(a),from(b)) // Numerics: case SInt(value) => IntExp(value) case SList(SSymbol("zero?"), arg) => ZeroPExp(from(arg)) case SList(SSymbol("-"),a,b) => SubExp(from(a),from(b)) case SList(SSymbol("+"),a,b) => PlusExp(from(a), from(b)) case SList(SSymbol("*"),a,b) => TimesExp(from(a),from(b)) case SList(SSymbol("="),a,b) => EqExp(from(a), from(b)) // Lists: case SList(SSymbol("cons"),car,cdr) => ConsExp(from(car),from(cdr)) case SList(SSymbol("car"),arg) => CarExp(from(arg)) case SList(SSymbol("cdr"),arg) => CdrExp(from(arg)) case SList(SSymbol("quote"),SList()) => NullExp() case SList(SSymbol("pair?"),arg) => PairPExp(from(arg)) case SList(SSymbol("null?"),arg) => NullPExp(from(arg)) // Binding forms: case SList(SSymbol("let"), bindings, body) => { val varexps = bindings.toList map { case SList(SSymbol(id),exp) => (id,from(exp)) } val (vars,exps) = varexps.unzip LetExp(vars,exps,from(body)) } case SList(SSymbol("letrec"), SList(SList(SSymbol(fun),lambda)), body) => LetRecExp(fun,from(lambda),from(body)) // Application case SCons(fun,args) => AppExp(from(fun), args.toList map from) } } }
where the procedure from
makes significant use of
custom extractor SList
:
object SList { /** Matches if the underlying S-Expression is a true list, and converts it to a Scala list. */ def unapplySeq (sexp : SExp) : Option[List[SExp]] = { if (sexp.isList) { Some(sexp.toList) } else { None } } }
Code
The full code is available: SExp.scala, Exp.scala.
Related pages
- For more on Scala, I recommend Programming in Scala: