If you're not yet familiar with the lambda-calculus, or if you don't fully appreciate its universality as a programming language, you might want to first read a couple other blog posts before charging ahead:
Or, grab Benjamin Pierce's Types and Programming Languages, which has a full, detailed treatment of the lambda-calculus:
For the machine below, we'll import
Syntax module containing an abstract syntax tree datatype for
type Var = String data Lambda = Var :=> Exp data Exp = Ref Var | Lam Lambda | Exp :@ Exp
This table shows how to encode expressions from the lambda-calculus:
|λv . e||
A word on idiomatic Haskell
Most of the Haskell in this article is not idiomatic.
For examples, data constructors take tuples instead of expanding the components into fields.
I took this route to keep the code as aesthetically close to the underlying formal mathematics as possible.
The setting: State machines
Before we implement an interpreter as a CEK machine, let's look at state machines in general, and how they can encode an interpreter.
Mathematically, a state machine is a state-space paired with a transition relation that relates each state to its subsequent states.
In code, a state-space is a type. Values of that type are states.
We'll use the typename
Σ for the state-space.
In the simple case, which holds for the CEK machine, the transition relation is deterministic, so a function from one state to another models it fully.
We'll use the name
step for the
transition relation rendered as a function:
step :: Σ -> Σ
To "execute" a deterministic state-machine, we want to move from state to state until we hit a "final" state.
The function terminal does this:
terminal :: (Σ -> Σ) -> (Σ -> Bool) -> Σ -> Σ terminal step isFinal ς0 | isFinal ς0 = ς0 | otherwise = terminal step isFinal (step(ς0))
the deterministic transition relation
that indicates whether a state has no successor;
and the initial state
which to start.
To model an interpreter as a state machine,
we need an
that maps a program into a state:
inject :: Program -> Σ
In the case of the lambda-calculus,
type Program = Exp.
Then, to use
terminal, we need a
isFinal :: Σ -> Bool
Now it's easy to define the evaluation function:
evaluate :: Program -> Σ evaluate pr = terminal step isFinal (inject(pr))
The evaluator maps a program to its final state (or never terminates if the program is non-terminating).
The CEK machine
There are only three components of a state in a CEK machine: a control component, an environment component, and a continuation component.
The control component is the expression currently being evaluated.
The environment component is a map from variables to values. The environment is the local evaluation context of the expression.
The continuation component is a stack of frames. The continuation is the dynamic evaluation context of the expression. Each frame marks a context awaiting the value of an expression.
In Haskell, the state-space (Σ) is a triple:
type Σ = (Exp,Env,Kont) data D = Clo (Lambda, Env) type Env = Var -> D data Kont = Mt | Ar (Exp,Env,Kont) | Fn (Lambda,Env,Kont)
D contains values.
For the basic lambda-calculus, there is only one kind of value: the closure.
A closure is a lambda term paired with an environment that defines
the values of its free variables.
If we wanted to add more values to the language, like integers or strings,
they would show up as variants in the datatype
(The choice of the name
is rooted in the history of programming languages.
It means both the "domain of values" and "denotable values.")
There are three kinds of continuations:
the empty continuation (
the "I hold an argument to evaluate" continuation
Ar) and the
"I contain an evaluated function, and now I'm evaluating
an argument term" continuation (
To use this machine, we first inject expressions into states:
inject :: Exp -> Σ inject (e) = (e, ρ0, Mt) where ρ0 :: Env ρ0 = \ x -> error $ "no binding for " ++ x
step function moves execution
forward by one step:
step :: Σ -> Σ step (Ref x, ρ, κ) = (Lam lam,ρ',κ) where Clo (lam, ρ') = ρ(x) step (f :@ e, ρ, κ) = (f, ρ, Ar(e, ρ, κ)) step (Lam lam, ρ, Ar(e, ρ', κ)) = (e, ρ', Fn(lam, ρ, κ)) step (Lam lam, ρ, Fn(x :=> e, ρ', κ)) = (e, ρ' // [x ==> Clo (lam, ρ)], κ)
A few auxiliary definitions handle function extension in this code:
(==>) :: a -> b -> (a,b) (==>) x y = (x,y) (//) :: Eq a => (a -> b) -> [(a,b)] -> (a -> b) (//) f [(x,y)] = \ x' -> if (x == x') then y else f(x')
Given a function
f :: a -> b,
f // [x ==> y]
is the identical to the function
except that if given
We can describe the four cases
- Evaluating a reference? Look it up in the environment.
- Evaluating a function application? First evaluate the function.
- Evaluated the function? Go evaluate the argument term.
- Evaluated the argument too? Perform the application.
Recognizing final states is easy--just check for the empty continuation:
isFinal :: Σ -> Bool isFinal (Lam _, ρ, Mt) = True isFinal _ = False
And, that's it.
Adding more language features consists mostly of adding new kinds of expressions and continuations corresponding to their contexts.
To handle side effects in an interpreter, you can use side effects in the host language, or if the host language is pure (like math or Haskell), you can upgrade to a CESK machine.
I'll treat CESK machines in a future post.
Semantics Engineering with PLT Redex is an entire book by Matthias Felleisen, Robby Findler and Matthew Flatt on modeling programming languages (and their complex features) with machines:
The best part? The formal mathematics are encoded in the Redex module for Racket so you can execute the mathematics.
Redex also comes with automated testing tools that can (and will) find bugs in your language implementation.
The classic MIT textbook Structure and Interpretation of Computer Programs is essentially a textbook on interpreters: