Java as a CESK machine

[article index] [] [@mattmight] [+mattmight] [rss]

CESK machines can concisely and efficiently model the semantics of functional languages like the lambda calculus.

It's less appreciated that CESK machines can do the same for object-oriented and imperative languages too.

This post explores a CESK machine for a high-level, object-oriented bytecode inspired by the Dalvik virtual machine for Java.



A CESK machine, according to Google image search.

Read on for an implementation of an object-oriented CESK machine.

I've implemented this in Java not because it's a good fit--it would have been much easier in Racket or Haskell--but because many of my new students don't yet know functional languages and they don't yet understand formal semantics.

When I start new students on a semantics for the lambda calculus in Racket, they get lost because they know neither Racket nor the lambda calculus.

This post is a bridge.

It's meant to graft an appreciation for formal semantics onto the knowledge level of the average computer science graduate.

Companion reading

This post serves as an applied companion to translating math into code.

If notation in this post seems unfamiliar, I recommend checking that post.

You might also want to read about the CEK machine for lambda calculus in Haskell or a CESK machine for ANF in Racket.

If you really want to understand these machines well, the new textbook Semantics Engineering with PLT Redex

is an excellent resource.

The fully documented code is also available.

A simple object-oriented language

We'll work with a dynamically typed object-oriented language inspired by the Dalvik virtual machine, to which Java applications for Android are compiled.

I created this language by looking at every bytecode for Dalvik, and making sure its semantics have a straightforward representation.

A BNF grammar for this language is divided largely into two classes of terms--statements and expressions:

 program ::= class-def ...

 class-def ::= class class-name extends class-name
               { field-def ... method-def ... }

 field-def ::= var field-name ;

 method-def ::= def method-name($name, ..., $name) { body }

 body ::= stmt ...

 stmt ::= label label:
       |  skip ;
       |  goto label ;
       |  if aexp goto label ;
       |  $name := aexp | cexp ;
       |  return aexp ;
       |  aexp.field-name := aexp ;
       |  push-handler class-name label ;
       |  pop-handler ;
       |  throw aexp ;
       |  move-exception $name ;

 cexp ::= new class-name 
       |  invoke aexp.method-name(aexp,...,aexp) 
       |  invoke super.method-name(aexp,...,aexp) 

 aexp ::= this
       |  true | false
       |  null
       |  void
       |  $name
       |  int
       |  atomic-op(aexp, ..., aexp)
       |  instanceof(aexp, class-name)
       |  aexp.field-name

Statements encode individual actions for the machine; atomic expressions encode atomically computable values; and complex expressions encode expressions with possible non-termination or side effects.

We assume that every procedure has access to an infinite set of frame-local variables, each prefixed with $. (We'll also call these registers.)

A CESK interpreter

A CESK interpreter encodes execution by jumping from state to state.

Each state in the CESK interpreter needs four components:

  • A control component to indicate where it is in the program -- a sequence of statements.
  • An environment component to dictate the values of local variables -- a simple frame pointer in this machine.
  • A store component to map addresses to values.
  • A continuation to model the program stack -- a sequence of frames denoting procedure return contexts and exception handlers for this machine.

Formally, we can create a state-space, $\Sigma$, for this machine:

\[ \varsigma \in \Sigma = \mathsf{Stmt}^* \times \mathit{FP} \times \mathit{Store} \times \mathit{Kont} \text. \]

As Java code, a state is a class with four fields:

class State {

  public final Stmt stmt ;
  public final FramePointer fp ;
  public final ImmutableMap<Addr,Value> store ;
  public final Kont kont ;

  // ...
}

The ImmutableMap class is from Google's guava library.

One might argue that the field stmt should have type List<Stmt> to encode the set $\mathsf{Stmt}^*$.

Yet, this is not necessary, since statements (covered next) contain a Stmt next field, which means every Stmt object is, implicitly, a linked list.

Statements, expressions and classes

For each statement production in the grammar, there is a corresponding statement subclass; thus the class Stmt encodes the set $\mathsf{Stmt}$:

abstract class Stmt { 
  public final Stmt next ;
  // ...
}
class LabelStmt extends Stmt { ... }
class SkipStmt extends Stmt { ... }
class GotoStmt extends Stmt { ... }
class IfStmt extends Stmt { ... }
class AssignAExpStmt extends Stmt { ... }
class NewStmt extends Stmt { ... } 
abstract class AbstractInvokeStmt extends Stmt { ... }
class InvokeStmt extends AbstractInvokeStmt { ... }
class InvokeSuperStmt extends AbstractInvokeStmt { ... }
class ReturnStmt extends Stmt { ... }
class FieldAssignStmt extends Stmt { ... }
class PushHandlerStmt extends Stmt { ... }
class PopHandlerStmt extends Stmt { ... }
class ThrowStmt extends Stmt { ... } 
class MoveExceptionStmt extends Stmt { ... }

There's a similar structure to expressions:

abstract class AExp { ... }
class ThisExp extends AExp { ... }
class BooleanExp extends AExp { ... }
class NullExp extends AExp { ... }
class VoidExp extends AExp { ... }
class RegisterExp extends AExp { ... }
class IntExp extends AExp { ... }
class AtomicOpExp extends AExp { ... }
class InstanceOfExp extends AExp { ... }
class FieldExp extends AExp { ... }

Statments form the bodies of methods:

class MethodDef {
  public final String name ;
  public final String[] formals ;
  public final Stmt body ;

  // ...
}

Methods (and fields) form the definition of classes:

class ClassDef {

  public final String name ;
  public final String parentClassName ;

  public boolean isInstanceOf (String otherClassName) { ... }

  public ClassDef parentClass() { ... }

  public MethodDef lookup(String methodName) { ... }

  public void addMethod(String methodName,
                        String[] formals, Stmt body) { ... }
  public void addField(String fieldName) { ... }
  
  public static ClassDef forName (String className) { ...}
}

And, in this language, fields are only names:

class FieldDef {
  public final String name ;

  // ...
}

It's left as an exercise for the reader to produce a parser for into this abstract syntax tree structure.

These posts may help:

Frame pointers

Frame pointers are relatively opaque. All we need to be able to do with them is make new ones and compare between them; thus:

\[ \mathit{fp} \in \mathit{FP} \text{ is an infinite, ordered set.} \]

Since we'll have a couple different kind of pointers in this machine, the code has a Pointer class:

abstract class Pointer {

  final long value ;

  protected Pointer() {
    this.value = ++maxPointer ;
  }

  private static long maxPointer = 0 ;

  public static final Comparator ordering = // ...
}

And, frame pointers inherit this structure:

class FramePointer extends Pointer {

  public FramePointer push() {
    return new FramePointer() ;
  }
  
  // ...
}

When we need a new frame pointer, we can call the push() method.

Addresses

Local variables (registers) live as offsets from a frame pointer.

To look up a variable's value, we need its address.

In this machine, we compute "abstract" offset addresses by pairing base frame pointers with the name of a register:

\[ \mathit{FrameAddr} = \mathit{FramePointer} \times \mathsf{Name} \text. \] \[ \mathit{frameOffset} : \mathit{FramePointer} \times \mathsf{Name} \to \mathit{FrameAddr} \] \[ \mathit{frameOffset}(\mathit{fp}, \mathit{registerName}) = (\mathit{fp}, \mathit{registerName}) \text. \]

This scheme is not unlike what actually happens in compilation.

Local variables often live in the stack, at precomputed offsets frome the frame pointer.

For example, given a procedure with two variables, x and y, a compiler might emit code to access them as $fp[offset of x] and $fp[offset of y] respectively.

This machine trades in numeric offsets for symbolic offsets.

Since we'll have other kinds of pointers and offset addresses (for fields in objects), it's useful to have an abstract offset address base class, which comes from an abstract address base class:

abstract class Addr { 
  public static final Comparator ordering = // ...
}

abstract class OffsetAddr extends Addr {
  
  public final Pointer pointer ;
  public final String offset ;

  public static final Comparator ordering = // ...
}

Frame addresses inherit from this directly:

class FrameAddr extends OffsetAddr {
  public FrameAddr(FramePointer fp, String offset) {
    super(fp,offset) ;
  }
}

Objects in this machine are represented by a single base pointer, and field addresses come from offsets to this pointer:

\[ \mathit{FieldAddr} = \mathit{ObjectPointer} \times \mathsf{Name} \text. \] \[ \mathit{fieldOffset} : \mathit{ObjectPointer} \times \mathsf{Name} \to \mathit{FieldAddr} \] \[ \mathit{fieldOffset}(\mathit{op}, \mathit{fieldName}) = (\mathit{op}, \mathit{fieldName}) \text. \]

In code, field addresses are structurally the same as frame addresses:

class ObjectPointer extends Pointer {
  public ObjectPointer() {
    super() ;
  }
  
  // ...
}

class FieldAddr extends OffsetAddr {
  public FieldAddr(ObjectPointer op, String field) {
    super(op,field) ;
  }
}

To the individual pointer classes, we also add an offset(name) method for quickly converting offsets into addresses:

abstract class Pointer {
  public abstract Addr offset(String name) ;
  // ...
}

The set of all addresses is the set of frame addresses and field addresses:

\[ a \in \mathit{Addr} = \mathit{FrameAddr} + \mathit{FieldAddr} \text. \]

Stores

The store is a map from addresses to values:

\[ \sigma \in \mathit{Store} = \mathit{Addr} \rightharpoonup \mathit{Value} \]

This machine has a variety of values:

\[ v \in Value = \mathit{Object} + \mathbb{Z} + \{ \mathbf{void}, \mathbf{null}, \mathbf{true}, \mathbf{false} \} \]

In code, each af these value types has its own class:

abstract class Value { ... }
class TrueValue extends Value { ... } 
class FalseValue extends Value { ... }
class NullValue extends Value { ... }
class VoidValue extends Value { ... }
class IntValue extends Value { ... }
class ObjectValue extends Value { ... }

An object is a class name paired with a base object pointer:

\[ o \in \mathit{Object} = \mathsf{ClassName} \times \mathit{OP}\text, \]

where the set of object pointers $\mathit{OP}$ corresponds to the ObjectPointer class.

The code is a straightforward transliteration of the math:

class ObjectValue extends Value {
  public final ObjectPointer pointer ;
  public final String className ;

  // ...
}

Continuations

Continuations represent the structure of the program stack.

If you drill down into them, you'll find a record of open procedure calls and active exception handlers.

There are three kinds of continuation in this machine---assignment continuations, handler continuations and the halt continuation:

\[ \begin{array}{rcl} \kappa \in \mathit{Kont} &::=& \mathbf{assign}(\mathit{name}, \vec{\mathit{s}}, \mathit{fp}, \kappa) \\ & \mathrel{|} & \mathbf{handle}(\mathit{className},\mathit{label},\kappa) \\ & \mathrel{|} & \mathbf{halt} \end{array} \]

An assignment continuation encodes the return context for procedure call: the register awaiting the result ($name$); the statement sequence directly after the call ($\vec{\mathit{s}}$); the frame pointer from before the call ($\mathit{fp}$); and the next continuation ($\kappa$).

A handler continuation contains the type of exceptions that it catches ($\mathit{className}$), and the label to which execution should branch ($\mathit{label}$), and the next continuation ($\kappa$).

When a procedure returns, it goes to the top-most assignment continuation on the stack.

When an exception gets thrown it finds the top-most matching handler continuation.

Represent continuations as Java code works best with a base continuation type, and then a subclass for each kind of continuation:

abstract class Kont {
  public final Kont next ;
  // ...
}


class HandlerKont extends Kont {
  public final String className ;
  public final String label ;
  // ...
}


class AssignKont extends Kont {
  public final String register ;
  public final Stmt stmt ;
  public final FramePointer fp ;
  // ...
}

class HaltKont extends Kont { ... }

Executing the machine

To execute a program using this machine, there are a few set-up steps.

First, we assume the program is parsed as a sequence of ClassDef objects.

Then, the "main" ClassDef is passed to the execute() procedure:

  public static void execute(ClassDef mainClass) {
    // Grab the main method:
    MethodDef mainMethod = mainClass.lookup("main") ;

    // Construct an object pointer for mainClass:
    ObjectPointer op = new ObjectPointer() ;

    // Construct an object value for mainClass:
    ObjectValue obj = new ObjectValue(mainClass.name,op) ;

    // Allocate an initial frame pointer:
    FramePointer fp0 = new FramePointer() ;

    // Create an initial store:
    ImmutableMap<Addr,Value> store0 = 
     new ImmutableSortedMap.Builder<Addr,Value>(Addr.ordering).build() ;

    // Insert the initial object at register $this:
    store0 = Store.extend(store0, fp0.offset("this"), obj) ;

    // Grab the halt continuation:
    Kont halt = HaltKont.HALT ;

    // Synthesize the initial state:
    State state = new State(mainMethod.body, fp0, store0, halt) ;

    // Run until termination:
    while (true) {
      state = state.next() ;
    }
  }

The execute() procedure finds the main method.

It then instantiates the main class by constructing the object for it.

It synthesizes and initial frame pointer and an initial store containing the main object.

It grabs the halt continuation, packages everything together into an initial state and then starts stepping through program states.

The method next() on the State object steps to the next State of execution.

If there is no next state, it throws an exception containing the final result of the program.

Thus, the bulk of the action happens inside the next() method.

Formally, we can represent the next method as a partial map from states to states:

\[ \mathit{next} : \Sigma \rightharpoonup \Sigma\text. \]

In code, the next method needs to dispatch on the statement type, so its implementation is brief:

class State {
  public State next () {
    return stmt.step(fp,store,kont) ;
  }
  // ...
}  

Now we need to define the step method for each kind of object.

Stepping over labels and skips

If the current statement is a skip, the $\mathit{next}$ function moves to the next statement:

\[ \mathit{next}( \mathbf{skip} : \vec{s} , \mathit{fp} , \sigma, \kappa ) = ( \vec{s} , \mathit{fp} , \sigma, \kappa ) \]

The corresponding code looks almost the same:

class SkipStmt extends Stmt {
  public State step (FramePointer fp, 
                     ImmutableMap store,
                     Kont kont) {
    return new State(this.next, fp, store, kont) ;
  }
  // ...
}

Labels are handled the same way:

\[ \mathit{next}( \mathbf{label}\; \mathit{label} : \vec{s} , \mathit{fp} , \sigma, \kappa ) = ( \vec{s}, \mathit{fp} , \sigma, \kappa ) \text. \]

Unconditional jumps

Handling goto is not much different than skip, except that the machine must look up the target statement sequence:

\[ \mathit{next}( \mathbf{goto}\; \mathit{label} : \vec{s} , \mathit{fp} , \sigma, \kappa ) = ( \mathit{S}(\mathit{label}), \mathit{fp}, \sigma, \kappa ) \text, \]

where the function $\mathit{S} : \mathit{Label} \to \mathsf{Stmt}^*$ maps a label to the statement sequence starting with that label.

In code, the function $\mathit{S}$ exists as a static method in the class Stmt that gets constructed as new label statements are created:

abstract class Stmt {
 
  private static Hashtable<String,Stmt> stmtMap =
    new Hashtable<String,Stmt>() ;

  protected void register(String labelName) {
    stmtMap.put(labelName, this) ;
  }

  public static Stmt forLabel(String labelName) {
    return stmtMap.get(labelName) ;
  }

  // ...
}

class LabelStmt extends Stmt {

  public final String label ;

  public LabelStmt(String label, Stmt next) {
    super(next) ;
    this.label = label ;
    this.register(label) ;
  }

  // ...
}

GotoStmt looks up the jump target:

final class GotoStmt extends Stmt {

  public final String label ;

  public State step (FramePointer fp, 
                     ImmutableMap store,
                     Kont kont) {
    return new State(Stmt.forLabel(label), fp, store, kont) ;
  }
  // ...
}

Conditionals

The if-goto form is not much more complicated than a skip and a jump: the wrinkle is that it needs to evaluate the conditional expression.

For evaluating expressions, $\mathit{next}$ will turn to the atomic expression evaluator: \[ \mathcal{A} : \mathsf{AExp} \times \mathit{FP} \times \mathit{Store} \rightharpoonup \mathit{Value}. \] Since this function dispatches on the type of expression, it will be implemented as a method in the class AExp:

abstract class AExp {
  public abstract Value eval(FramePointer fp, 
                             ImmutableMap<Addr,Value> store) ;
  // ...
}

We'll cover this method in more detail after the transition function.

The if-goto form evaluates the condition, checks for false and then changes the current statement:

\[ \mathit{next}( \mathbf{if}\;e\;\mathbf{goto}\;\mathit{label} : \vec{s}, \mathit{fp}, \sigma, \kappa ) = \\ \begin{cases} (\mathcal{S}(\mathit{label}), \mathit{fp}, \sigma, \kappa) & \mathcal{A}(e, \mathit{fp}, \sigma) \neq \mathit{false} \\ (\vec{s}, \mathit{fp}, \sigma, \kappa) & \text{otherwise} \text. \end{cases} \]

The corresponding step method does the same:

class IfStmt extends Stmt {

  public final String label ;
  public final AExp condition ;

  public State step (FramePointer fp, 
                     ImmutableMap store,
                     Kont kont) {
    // Test the condition:
    if (condition.eval(fp,store).toBoolean())
      // if true, jump to the label:
      return new State(Stmt.forLabel(label), fp, store, kont) ;
    else
      // if not, fall through:
      return new State(this.next, fp, store, kont) ;
  }

  // ...
}

Atomic assignments

Atomic assignment statements assign the value of an atomic expression to a variable.

This involves evaluating the expression, calculating the frame address to modify and then updating the store:

\[ \mathit{next}( \$\mathit{name}\;\mathbf{:=}\; e : \vec{s}, \mathit{fp}, \sigma, \kappa) = ( \vec{s}, \mathit{fp}, \sigma', \kappa) \text{, where } \\ \sigma' = \sigma[ (\mathit{fp},\mathit{name}) \mapsto \mathcal{A}(e,\mathit{fp},\mathit{store}) ]\text. \]

The notation $f[x \mapsto y]$ is the functional update operation:

\[ f[x \mapsto y](x) = y \\ f[x \mapsto y](x') = f(x') \text{ if } x \neq x' \text. \]

In Java:

class AssignAExpStmt extends Stmt {

  public final String lhs ;
  public final AExp rhs ;

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {
    Addr a = fp.offset(lhs) ;

    Value val = rhs.eval(fp, store) ;

    ImmutableMap<Addr,Value> store_ =
      Store.extend(store, a, val) ;

    return new State(this.next, fp, store_, kont) ;
  }

  // ...
}

The Store.extend auxiliary method creates a new ImmutableMap<Addr,Value> object, but with the address a mapped to value val.

New object creation

Object creation is an assignment form in which the assigned value is a newly created object:

\[ \mathit{next}( \$\mathit{name}\;\mathbf{:=}\; \mathbf{new}\; \mathit{className} : \vec{s}, \mathit{fp}, \sigma, \kappa) = ( \vec{s}, \mathit{fp}, \sigma', \kappa) \text{, where } \\ \sigma' = \sigma[ (\mathit{fp},\mathit{name}) \mapsto (\mathit{className}, \mathit{op}') ] \\ \mathit{op}' \text{ is fresh in } \sigma \text. \]

Freshness means that it's an object pointer that has never been used.

In Java:

class NewStmt extends Stmt {

  public final String lhs ;
  public final String className ;

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {
    Addr a = fp.offset(lhs) ;

    ObjectPointer op = new ObjectPointer() ;

    ObjectValue object = new ObjectValue(className,op) ;

    ImmutableMap<Addr,Value> store_ =
      Store.extend(store, a, object) ;

    return new State(this.next, fp, store_, kont) ;
  }

  // ...
}

Method invocation

Method invocation is probably the most complex statement to model, since it will involve all four components of the machine.

Since this language supports inheritance, method look-up may require a traversal of the class hierarchy.

Let us first assume the method has been looked up. A helper function will then apply the method to the active arguments:

\[ \mathit{applyMethod} : \mathit{Method} \times \mathsf{Name} \times \mathit{Value} \times \mathsf{AExp}^* \times \mathit{FP} \times \mathit{Store} \times \mathit{Kont} \rightharpoonup \Sigma \]

It needs to look up the values of the arguments, bind them to the formal parameters of the method, create a new frame pointer and create a new continuation:

Assuming $m = \mathbf{def}\;methodName(\$v_1,\ldots,\$v_n)\;\{\mathit{body}\}$:

\[ \mathit{applyMethod}( m, \mathit{name}, \mathit{val}_{\mathrm{this}}, \vec{e}, \mathit{fp}, \sigma, \kappa ) = (\mathit{body}, \mathit{fp}', \sigma'', \kappa')\text{, where } \\ \mathit{fp}' \text{ is fresh } \\ \sigma' = \sigma[(\mathit{fp}',\mathtt{$this}) \mapsto \mathit{val}_{\mathrm{this}}] \\ \sigma'' = \sigma'[ (\mathit{fp}',v_i) \mapsto \mathcal{A}(e_i, \mathit{fp}, \sigma) ] \\ \kappa' = \mathbf{assign}(\mathit{name},\vec{s},\mathit{fp},\kappa) \text. \]

Since both regular invocation and super-class invocation will use this helper, it can go into a common abstract base class:

abstract class AbstractInvokeStmt extends Stmt {

  public final String lhs ;
  public final String methodName ;
  public final AExp[] args ;

  protected State applyMethod(MethodDef m, ObjectValue thiss, 
                              FramePointer fp,
                              ImmutableMap<Addr,Value> store,
                              Kont kont) {
    Stmt stmt = m.body ;

    FramePointer fp_ = fp.push() ;

    Kont kont_ = new AssignKont(this.lhs, this.next, fp, kont);

    ImmutableMap<Addr,Value> store_ =
      = Store.extend(store, fp_.offset("$this"), thiss); 

    for (int i = 0; i < m.formals.length; ++i) {
      Addr a = fp_.offset(m.formals[i]) ;
      Value v = args[i].eval(fp, store) ;
      store_ = Store.extend(store_, a, v) ;
    }

    return new State(stmt, fp_, store_, kont_) ;
  }

  // ...
}

Method invocation now reduces to method look-up:

\[ \mathit{next}( \$\mathit{name}\;\mathbf{:=}\; \mathbf{invoke}\; \mathit{e}\mathbf{.}\mathit{methodName} (e_1, \ldots, e_n) , \mathit{fp}, \sigma, \kappa ) = \\ \mathit{applyMethod}(m, \mathit{name}, \mathit{val}_{\mathrm{this}}, \langle e_1,\dots,e_n \rangle, \mathit{fp}, \sigma, \kappa) \\ \mathit{val}_{\mathrm{this}} = \sigma(\mathit{fp},\mathtt{\$this}) \\ (\mathit{op},\mathit{className}) = \mathit{val}_{\mathrm{this}} \\ m = \mathit{lookup}(\mathit{className},\mathit{methodName}) \text, \]

where $\mathit{lookup}$ is a partial function that traverses the class hierarchy until it finds the method.

In Java, this becomes:

class InvokeStmt extends AbstractInvokeStmt {

  public final AExp object ;

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {
    ObjectValue thiss = (ObjectValue)object.eval(fp,store) ;

    ClassDef classs = ClassDef.forName(thiss.className) ; 

    MethodDef method = classs.lookup(methodName) ;

    return applyMethod(method, thiss, fp, store, kont) ;
  }
}

Invoking a super-class method is almost the same, except that the lookup happens in the parent class of this.

Procedure return

Procedure return "applies" the current continuation to the return value:

\[ \mathit{next}(\mathbf{return}\; e, \mathit{fp}, \sigma, \kappa) = \mathit{applyKont} ( \kappa, \mathcal{A}(e,\mathit{fp},\sigma), \sigma )\text, \]

where $\mathit{applyKont} : \mathit{Kont} \times \mathit{Value} \times \mathit{Store} \rightharpoonup \Sigma$ applies a continuation:

\[ \mathit{applyKont}( \mathbf{assign}(\mathit{name},\vec{s},\mathit{fp},\kappa), \mathit{val}, \mathit{store} ) = (\vec{s}, \mathit{fp}, \sigma[(\mathit{fp},\mathit{name}) \mapsto \mathit{val}], \kappa) \text. \]

If the current continuation is an exception handler, it jumps over it:

\[ \mathit{applyKont}( \mathbf{handle}(\mathit{className},\mathit{label},\kappa), \mathit{val}, \sigma ) = \mathit{applyKont}( \kappa, \mathit{val}, \sigma ) \]

In Java, procedure return is spread across several classes:

class ReturnStmt extends Stmt {

  public AExp result ;

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {
    Value returnValue = result.eval(fp, store) ;
    
    return kont.apply(returnValue, store) ;
  }

  // ...
}

class AssignKont extends Kont {

  public final String register ;
  public final Stmt stmt ;
  public final FramePointer fp ;

  public State apply (Value returnValue,
                      ImmutableMap<Addr,Value> store) {
    ImmutableMap<Addr,Value> store_ =
      Store.extend(store, fp.offset(register), returnValue) ;

    return new State(stmt, fp, store_, next) ; }

  // ...
}

class HandlerKont extends Kont {

  public final String className ;
  public final String label ;

  public State apply (Value returnValue,
                      ImmutableMap<Addr,Value> store) {
    return next.apply(returnValue,store) ;
  }

  // ...
}

Pushing exception handlers

The push-handler statement pushes a new exception handler on the top of the program stack:

\[ \mathit{next}( \mathbf{pushhandler}\;\mathit{className}\;\mathit{label} : \vec{s}, \mathit{fp}, \sigma, \kappa ) =\\ ( \vec{s}, \mathit{fp}, \sigma, \mathbf{handle}(\mathit{className},\mathit{label},\kappa) ) \]

In Java:

class PushHandlerStmt extends Stmt {

  public String className ;
  public String label ;

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {

    Kont kont_ = new HandlerKont(className,label,kont) ;

    return new State(next,fp,store,kont_) ;
  }
}

Popping exception handlers

Popping a handler takes the top-most handler off the stack:

\[ \mathit{next}( \mathbf{pophandler} : \vec{s}, \mathit{fp}, \sigma, \mathbf{handle}(\mathit{className},\mathit{label},\kappa) ) = ( \vec{s}, \mathit{fp}, \sigma, \kappa ) \]

In Java:

class PopHandlerStmt extends Stmt {

  public State step (FramePointer fp, 
                     ImmutableMap<Addr,Value> store,
                     Kont kont) {

    Kont kont_ = kont.popHandler() ;

    return new State(next, fp, store, kont_) ;
  }
}

Throwing and catching exceptions

The throw statement peels away layers of the stack until it finds a matching exception handler:

\[ \mathit{next}( \mathbf{throw}\;e : \vec{s}, \mathit{fp}, \sigma, kont) = \mathit{handle}(\mathcal{A}(e,\mathit{fp},\sigma), \mathit{fp}, \sigma, \kappa) \text, \]

where the function $handle : \mathit{Value} \times \mathit{FramePointer} \times \mathit{Store} \times \mathit{Kont} \rightharpoonup \Sigma$ peels the stack:

If $\mathit{className}$ is a subclass of $\mathit{className}'$, then:

\[ \mathit{handle}((\mathit{op}, \mathit{className}), \mathit{fp}, \sigma, \mathbf{handle}(\mathit{className}', \mathit{label}, \kappa)) =\\ (\mathcal{S}(\mathit{label}), \mathit{fp}, \sigma[(\mathit{fp},\mathtt{\$ex}) \mapsto (\mathit{op}, \mathit{className})], \kappa ) \]

But, if not, then:

\[ \mathit{handle}((\mathit{op}, \mathit{className}), \mathit{fp}, \sigma, \mathbf{handle}(\mathit{className}', \mathit{label}, \kappa)) =\\ \mathit{handle}( (\mathit{op}, \mathit{className}), \mathit{fp}, \sigma, \kappa ) \]

And, throws skip right over non-handler continuations:

\[ \mathit{handle}(\mathit{val}, \mathit{fp}, \sigma, \mathbf{assign}(\mathit{name}, \vec{s}, \mathit{fp}', \kappa)) =\\ \mathit{handle}( \mathit{val}, \mathit{fp}', \sigma, \kappa ) \]

The protocol is that the last thrown exception will be in the register $ex.

As with procedure return, the Java code is spread across several classes:

class ThrowStmt extends Stmt {

  public final AExp exception ;

  public State step (FramePointer fp, 
                     ImmutableMap store,
                     Kont kont) {
    Value exceptionValue = exception.eval(fp, store) ;

    return kont.handle((ObjectValue)exceptionValue,fp,store) ;
  }
  // ...
}

abstract class Kont {
  public abstract State handle(ObjectValue exception,
                      FramePointer fp,
                      ImmutableMap store) ;
  // ...
}

class HandlerKont extends Kont {

  public State handle(ObjectValue exception,
                      FramePointer fp, 
                      ImmutableMap store) {
    if (exception.isInstanceOf(className)) {
      ImmutableMap store_ =
        Store.extend(store, fp.offset("$ex"), exception) ;
      return new State(Stmt.forLabel(label), fp, store_, next) ;
    }

    else return next.handle(exception,fp,store) ;
  }
  // ...
}

class AssignKont extends Kont {

  public State handle(ObjectValue exception,
                      FramePointer fp,
                      ImmutableMap store) {
    return next.handle(exception,this.fp,store) ;
  }
  // ...
}

Capturing exceptions

When an exception is caught, execution branches to a handling label.

To find out which exception was caught, the code at the label can examine the distinguished register named $ex using the move-exception statement, which is effectively syntactic sugar; this:

 move-exception $name ;

is the same as:

 $name := $$ex ;

Evaluating expressions

For evaluating expressions, the function $\mathit{next}$ repeatedly used the atomic expression evaluator: \[ \mathcal{A} : \mathsf{AExp} \times \mathit{FP} \times \mathit{Store} \rightharpoonup \mathit{Value}. \] Since this function dispatches on the type of expression, it will be implemented as a method in the class AExp:

abstract class AExp {
  public abstract Value eval(FramePointer fp, 
                             ImmutableMap<Addr,Value> store) ;
  // ...
}

Most expressions are straightforward to implement, so a few highlights illuminate.

Implementing register look-up needs both the frame pointer and the store:

\[ \mathcal{A}(\mathit{name}, \mathit{fp}, \sigma) = \sigma(\mathit{fp}, \mathit{name}) \text. \]

In Java:

class RegisterExp extends AExp {

  public final String register ;

  Value eval (FramePointer fp, ImmutableMap <Addr,Value> store) {
    Addr a = fp.offset(register) ;
    return store.get(a) ;
  }

  // ...
}

Looking up this uses the specially designated register $this:

class ThisExp extends AExp {
  Value eval (FramePointer fp, ImmutableMap <Addr,Value> store) {
    return store.get(fp.offset("$this")) ;
  }
  // ...
}

Accessing an object field computes the field offset:

class FieldExp extends AExp {

  public final AExp object ;
  public final String field ;

  Value eval (FramePointer fp, ImmutableMap <Addr,Value> store) {
    ObjectValue op = (ObjectValue)object.eval(fp, store) ;
    Addr fieldAddr = op.offset(field) ;       
    return store.get(fieldAddr) ;
  }

  // ...
}

Exercises

  1. Re-implement this machine in a functional language like Racket, Scala or Haskell.
  2. Scale this machine up to the semantics of all Dalvik bytecodes.
  3. Apply systematic abstraction to this machine to convert it from an interpreter into a static analyzer.

Code

The fully documented code for this machine is available in a single Java file: OOCESK.java.

It requires the Google guava library to compile and run.

Related pages


[article index] [] [@mattmight] [+mattmight] [rss]