Lambda-style anonymous functions for C++ in less than 500 lines of code

[article index] [] [@mattmight] [rss]

I prefer functional languages like Scheme, Scala or Haskell over C++ for most tasks. So, when I'm forced to program in C++, I write functional C++.

Fortunately, it's possible to add the most important ingredient in functional languages--lambda-style anonymous functions--to standard C++.

For example, the expression

    lambda<int> (x) --> 3*x + 7 
represents a function that multiplies by 3 and then adds 7. (The <int> means that the anonymous function returns an integer.)

These anonymous functions are first-class; that is, they can be assigned to variables, passed as parameters or returned from functions; for instance:

    f = lambda<int> (x,y) --> x + y ;
so that f(2,3) == 5. Or, they can be applied directly, as in:
    (lambda<int> (z) --> z*z)(9)
which is just another way of saying 81.

These functions can also map over data structures; for example:

    B = (lambda<int> (x) --> 3*x + 1).map(n,A) 
produces a new array B in which B[i] == 3*A[i] + 1.

Lambda terms can also capture free variables and perform Currying; for instance, given the function definition

    Function2<int,int,int> h(int a, int b) {
     return lambda<int> (x,y) --> a*x + b*y ;
    } 
the value of h(3,4)(1,1) is 7.

To pull this off, the implementation exploits templates, operator overloading and function pointers (but, remarkably, not a single macro) to construct a type-safe domain-specific embedded language (DSEL) for describing anonymous functions.

If you're wondering how it's done (or how to hack the "maps to" operator --> into C++), read on for the gory details and the (unoptimized, proof-of-concept) implementation.

More resources

If you're into this sort of thing, you might also like:

A quick decomposition

Let's break down the expression:

 lambda<int> (x) --> x + 1

The expression

 lambda<int> (x) 

is a call to a templated function:

 template <typename Y,typename X1>
 Function1<Y,X1> lambda (Var<X1>& x1) ;

So, from this, we see that what appears to be a variable parameter to the lambda term, x, is actually an expression of type Var<X1>. In our DSEL, an expression of type Var<X1> represents an argument that has type X1.

This means that the variable x must be declared outside the lambda term. It's a good idea to declare all meta-parameters once at a global scope, e.g.:

Var<int> x, y, z ;
Var<char*> r, s ;

so that any lambda term that wants to accept an integer can use x, y or z, and r or s could be used for string arguments. (It's safe to re-use meta-parameters between lambdas; they won't be mixed.)

The type Function1<Y,X1> is a templated class for functions that take an X1 and produce a Y. So, a call to lambda<T> produces an anonymous function that returns type T. We don't have to specify lambda<Y,X1> because it can infer the type X1 from the meta-parameter supplied to the lambda function.

If we were willing to specify the body of the lambda term as the last parameter, e.g., lambda (x,x+1), then it could also infer the return type, but I don't think this looks as nice.

Glancing at the prototype for Function1, we find several operator overloads:

template <typename Y,typename X1>
class Function1 {
public:
  Function1<Y,X1> (Var<X1>& p1) ;

  Function1<Y,X1>& operator -- (int) ;

  Function1<Y,X1>& operator > (Exp<Y>& body) ;

  Y operator () (const X1& x1) ;

  // ...
} ;

This reveals how to implement the --> operator: it's actually just the juxtaposition of postfix decrement (--) and greater-than (>).

The postfix -- operator just returns this, and the greater-than operator binds the body of the lambda term to this function object. That is, the -- is just a syntactic placebo that improves aesthetics.

The function object overloads application to give it the appearance of being a function whenever it's called.

The body of the lambda term, x + 1, turns into a syntax tree for an expression.

Expression trees

The function objects store the bodies of lambda terms as trees that describe the structure of the computation.

Expression trees are parameterized by the type they return under evaluation:

template <typename T>
struct Exp {
  virtual T eval() ;
  // ...
} ;

And, there are subclasses to represent expressions like constants, variables and binary operations:

template <typename T>
class Const : public Exp<T> // ...

template <typename T>
class Var : public Exp<T> // ...

template <typename T>
class BinOp : public Exp<T> // ...

And, specialized instances of the base expression template introduce operators for types that support them; for instance, integer arithmetic:

template < >
struct Exp<int> {
  Exp<int>& operator + (Exp<int>& that) ;
  Exp<int>& operator * (Exp<int>& that) ;
  Exp<int>& operator + (int n) ;
  Exp<int>& operator * (int n) ;

  // Add any other operations you want here.
  // E.g.:  /  ^  - <=
}

Exp<int>& operator + (int n, Exp<int>& thiss) ;
Exp<int>& operator * (int n, Exp<int>& thiss) ;

So, an expression like x + 1 is roughly equivalent to

 new BinOp<int>(sum,x,new Const<int>(1))
where sum is a function pointer to a function that performs addition.

If you want to add expressions like function calls or array indexing to the DSEL, you can add additional template classes and specializations.

Code

What I've described so far gets you an interface that type-checks. To see the code that plumbs it all together, check out the full implementation. It's under 500 lines of heavily-commented C++ and includes testing examples.

This implementation is pretty fast, but it could be significantly optimized. Expressions templates and inlining annotations (assuming the compiler obeys them), for instance, could eliminate the small overhead. Please consider this a proof-of-concept implementation.