Background: λcalculus
If you're not yet familiar with the lambdacalculus, or if you don't fully appreciate its universality as a programming language, you might want to first read a couple other blog posts before charging ahead:
Or, grab Benjamin Pierce's Types and Programming Languages, which has a full, detailed treatment of the lambdacalculus:
For the machine below, we'll import
a Syntax
module containing an abstract syntax tree datatype for
the lambdacalculus:
type Var = String data Lambda = Var :=> Exp data Exp = Ref Var  Lam Lambda  Exp :@ Exp
This table shows how to encode expressions from the lambdacalculus:
λcalculus  Exp encoding 
v 
Ref "v"

λv . e 
Lam ("v" :=> e)

f(e) 
f :@ e

A word on idiomatic Haskell
Most of the Haskell in this article is not idiomatic.
For examples, data constructors take tuples instead of expanding the components into fields.
I took this route to keep the code as aesthetically close to the underlying formal mathematics as possible.
The setting: State machines
Before we implement an interpreter as a CEK machine, let's look at state machines in general, and how they can encode an interpreter.
Mathematically, a state machine is a statespace paired with a transition relation that relates each state to its subsequent states.
In code, a statespace is a type. Values of that type are states.
We'll use the typename Σ
for the statespace.
In the simple case, which holds for the CEK machine, the transition relation is deterministic, so a function from one state to another models it fully.
We'll use the name step
for the
transition relation rendered as a function:
step :: Σ > Σ
To "execute" a deterministic statemachine, we want to move from state to state until we hit a "final" state.
The function terminal does this:
terminal :: (Σ > Σ) > (Σ > Bool) > Σ > Σ terminal step isFinal ς0  isFinal ς0 = ς0  otherwise = terminal step isFinal (step(ς0))
The function terminal
takes
the deterministic transition relation step
;
the predicate isFinal
that indicates whether a state has no successor;
and the initial state ς0
from
which to start.
To model an interpreter as a state machine,
we need an inject
function
that maps a program into a state:
inject :: Program > Σ
In the case of the lambdacalculus,
type Program = Exp
.
Then, to use terminal
, we need a
finalstate detector:
isFinal :: Σ > Bool
Now it's easy to define the evaluation function:
evaluate :: Program > Σ evaluate pr = terminal step isFinal (inject(pr))
The evaluator maps a program to its final state (or never terminates if the program is nonterminating).
The CEK machine
There are only three components of a state in a CEK machine: a control component, an environment component, and a continuation component.
The control component is the expression currently being evaluated.
The environment component is a map from variables to values. The environment is the local evaluation context of the expression.
The continuation component is a stack of frames. The continuation is the dynamic evaluation context of the expression. Each frame marks a context awaiting the value of an expression.
In Haskell, the statespace (Σ) is a triple:
type Σ = (Exp,Env,Kont) data D = Clo (Lambda, Env) type Env = Var > D data Kont = Mt  Ar (Exp,Env,Kont)  Fn (Lambda,Env,Kont)
The type D
contains values.
For the basic lambdacalculus, there is only one kind of value: the closure.
A closure is a lambda term paired with an environment that defines
the values of its free variables.
If we wanted to add more values to the language, like integers or strings,
they would show up as variants in the datatype D
.
(The choice of the name D
is rooted in the history of programming languages.
It means both the "domain of values" and "denotable values.")
There are three kinds of continuations:
the empty continuation (Mt
),
the "I hold an argument to evaluate" continuation
(Ar
) and the
"I contain an evaluated function, and now I'm evaluating
an argument term" continuation (Fn
).
To use this machine, we first inject expressions into states:
inject :: Exp > Σ inject (e) = (e, ρ0, Mt) where ρ0 :: Env ρ0 = \ x > error $ "no binding for " ++ x
The step
function moves execution
forward by one step:
step :: Σ > Σ step (Ref x, ρ, κ) = (Lam lam,ρ',κ) where Clo (lam, ρ') = ρ(x) step (f :@ e, ρ, κ) = (f, ρ, Ar(e, ρ, κ)) step (Lam lam, ρ, Ar(e, ρ', κ)) = (e, ρ', Fn(lam, ρ, κ)) step (Lam lam, ρ, Fn(x :=> e, ρ', κ)) = (e, ρ' // [x ==> Clo (lam, ρ)], κ)
A few auxiliary definitions handle function extension in this code:
(==>) :: a > b > (a,b) (==>) x y = (x,y) (//) :: Eq a => (a > b) > [(a,b)] > (a > b) (//) f [(x,y)] = \ x' > if (x == x') then y else f(x')
Given a function
f :: a > b
,
the function f // [x ==> y]
is the identical to the function f
,
except that if given x
it returns y
.
We can describe the four cases
for step
informally:
 Evaluating a reference? Look it up in the environment.
 Evaluating a function application? First evaluate the function.
 Evaluated the function? Go evaluate the argument term.
 Evaluated the argument too? Perform the application.
Recognizing final states is easyjust check for the empty continuation:
isFinal :: Σ > Bool isFinal (Lam _, ρ, Mt) = True isFinal _ = False
And, that's it.
Adding more language features consists mostly of adding new kinds of expressions and continuations corresponding to their contexts.
To handle side effects in an interpreter, you can use side effects in the host language, or if the host language is pure (like math or Haskell), you can upgrade to a CESK machine.
I'll treat CESK machines in a future post.
More resources
Semantics Engineering with PLT Redex is an entire book by Matthias Felleisen, Robby Findler and Matthew Flatt on modeling programming languages (and their complex features) with machines:
The best part? The formal mathematics are encoded in the Redex module for Racket so you can execute the mathematics.
Redex also comes with automated testing tools that can (and will) find bugs in your language implementation.
The classic MIT textbook Structure and Interpretation of Computer Programs is essentially a textbook on interpreters: