Episode III: Self-inlining anonymous closures in C++

[article index] [] [@mattmight] [rss]

C++ doesn't leap to mind when one thinks of pure, functional languages.

But, it should.

C++ template meta-programming is not just pure and functional by itself; meta-programming allows programmers to add functional features to C++.

In previous posts, I've explored

In this post, I'll combine these techniques to create a powerful, efficient hybrid: self-inlining anonymous static closures.

Like the lexical closures, static closures can have free variables (or really, free values), yet like the self-inlining anonymous functions, they have no run-time penalty either.

They are inlined exactly where they are used, e.g., inside the body of a loop.

The goal is to have code like:

   double a = 0.5 ;
   double b = 0.3 ;
   double c = 1.2 ;
   INTEGRATE( a * ($0d ^ 2) + (b * $0d) + c, 0.0, 5.0 )

    // a * ($0d ^ 2) + (b * $0d) + c 
    //  is the function f such that:
    // f(x) = a*x2 + b*x + c

transform itself into a loop, in which the evaluation of the function happens inside the loop; the transformed code is equivalent to:

   for (double x = ...) {
     ...
     double y = 0.5*x*x  +  0.3*x  +  1.2 ;
     ...
   }

Read on for an overview of 350 lines of C++ that includes:

  • a DSEL for describing static anonymous closures;
  • a dual compile- and run-time encoding of the syntax tree;
  • a type-inference meta-program to infer return-types; and
  • a template meta-compiler that unfolds and inlines closures.

What is template meta-programming?

Template meta-programming allows a programmer to perform arbitrary computations and program transformations at compile-time.

It's all thanks to an accident of design.

The combination of templates and partial template specialization allows one to inject a purely functional term-rewriting system into the template-expansion phase of the C++ compiler.

(I gave a lengthier introduction to this topic in an earlier post on embedding an interpreter for the lambda calculus in C++ templates.)

Template meta-programming tip

I advocate a style of template meta-programming that trusts (but verifies) that the compiler will perform certain optimizations, such as inlining functions marked inline and folding constants.

Consequently, I don't have to perform all calculations at the template level; this saves labor, and meta-programs end up with a more natural look.

When debugging performance, it's useful to use the gcc flags

    -Winline -finline-functions -O2

GCC will report any failed attempt at inlining a function marked inline.

The DSEL

As with previous posts, we'll create a domain-specific embedded language (DSEL) for describing anonymous closures.

Since this is a proof of concept, this DSEL has only five kinds of expressions: sum, product, exponentiation, function arguments and literal values.

More could be added without difficulty.

To implement DSELs, the trick is to overload operations so that instead of executing those operations, they construct a data structure representing the computation.

That structure can then be reinterpreted with the semantics of the DSEL.

What makes template meta-programming interesting is that this reinterpretation can happen at compile-time, and that structure can be packed into the type, rather than just a run-time value.

(Of course, this isn't surprising to Lisp programmers.)

We'll need a struct to describe each kind of non-literal expression:

// E1 + E2
template <typename R1, typename R2> 
struct Sum ;

// E1 * E2
template <typename R1, typename R2>
struct Prod ;

// E1 ^ E2
template <typename R1, typename R2>
struct Pow ;

// nth argument, takes type T:
template <typename T, int n>
struct Arg ; 

It's useful to predeclare a few variables to stand in as arguments, but it's also possible to define them on the fly using what become no-op functions:

Arg<double,0> $0d ;
Arg<double,1> $1d ;

Arg<int,0> $0i ;
Arg<int,1> $1i ;

// $<T,position>() : Arg<T,position>
template <typename T, int position>
static inline Arg<T,position> $() ;

The goal is to give an expression like

     a * $0d + 3

a type like

      Sum< Prod< double , Arg< double , 0 > >, int >

so that the type, which is amenable to inspection by meta-programs, contains enough information about the structure of the computation to reconstitute that structure.

But, the type doesn't contain the value of a or even the constant 3; how can a meta-program reconstitute the computation from its type?

The answer is that these values must be packed into the run-time struct created by the expression a*$0d + 3, so that a meta-program can recurse over this struct as it unfolds the type back into a computation.

Take a look at the definition of the struct for Sum:

template <typename R1, typename R2> 
struct Sum : public Exp<Sum<R1,R2> > {
 const R1 lhs ;
 const R2 rhs ; 
 Sum<R1, R2> (R1 l, R2 r) : lhs(l), rhs(r) {}
} ;

So, a Sum contains the run-time values of both its operands (if any).

What's nice about this approach is that if an operand is an argument, it's a struct of size zero, so it disappears. If everything is known statically for the entire closure, the whole struct disappears; this approach is pay-as-you-go.

Sum inherits from Exp, and oddly, it passes its own type as the parameter.

This cyclic trickery gives the base class a handle on the derived class.

This handle is useful for pushing all of the syntactic sugar in the DSEL into the definition of the struct Exp:

template <typename T>
struct Exp {

  // Operators build up a syntax tree:
  template <typename M>
  inline Sum<T,M> operator + (M m) ;

  template <typename M>
  inline Prod<T,M> operator * (M m) ;

  template <typename M>
  inline Pow<T,M> operator ^ (M m) ;

  // Application inlines the function:
  inline  
  typename Infer<T>::type
  operator () () ;

  template <typename A0> 
  inline  
  typename Infer<T>::type
  operator () (A0 arg0) ;

  template <typename A0, typename A1> 
  inline  
  typename Infer<T>::type
  operator () (A0 arg0, A1 arg1) ;
 
} ;

The overloads for sum, product and exponentation (what was XOR) are straightforward.

The overloads for application, which allow a static closure to be applied as if it were a regular function, take some explaining.

A meta-program for type inference

It's annoying to have to write down the return type of an anonymous function. It clutters things up.

To get around this, a template meta-program can infer the return type of a static closure.

If T is the type of a static closure expression, then Infer<T>::type is the type that closure will return if invoked.

As the portion of Infer that handles inference of types over sums shows, it is a straightforward meta-program:

template <typename E> 
struct Infer {
  // If it's not a syntax expression,
  // then it's a free value, so its
  // type is itself:
  typedef E type ; 
} ;


// Arguments carry their type:
template <typename T, int n>
struct Infer< Arg<T,n> > {
  typedef T type ;
} ;

// type(double + double) => double
template < >
struct Infer< Sum<double,double> > { 
  typedef double type ; 
} ;

// type(double + int) => double
template < >
struct Infer< Sum<double,int> > {
  typedef double type ;
} ;

// type(int + double) => double
template < > 
struct Infer< Sum<int,double> > {
  typedef double type ; 
} ;

// type(int + int) => int
template < > 
struct Infer< Sum<int,int> > { 
  typedef int    type ;
} ;

// type(E1 + E2) => type(type(E1) + type(E2))
template <typename E1, typename E2> 
struct Infer < Sum<E1,E2> > {
  typename Infer<E1>::type typedef E1T ;
  typename Infer<E2>::type typedef E2T ;
  typename Infer<Sum<E1T,E2T> >::type typedef type ;
} ;

The rules for Prod and Pow are similar.

The meta-compiler

What makes anonymous closures actually work is the meta-compiler, Inline, which recurs over the type of the anonymous closure to produce its computation:

template <typename T>
struct Inline {
  // In the base case, we have a run-time value
  // stored in the struct built by the static
  // closure, so we inline it here.
  static inline T at (const T dynamicValue) {
    return dynamicValue ;
  }

  template <typename A0>
  static inline T at (const T dynamicValue, A0 arg0) {
    return dynamicValue ;
  }

  template <typename A0, typename A1>
  static inline T at (const T dynamicValue, 
                      A0 arg0, 
		      A1 arg1) {
    return dynamicValue ;
  }
} ;

// compile($0, x0, ...) => x1 
template <typename X1> 
struct Inline < Arg<X1,0> > {
  
  template <typename A0>
  static inline A0 at (const Arg<X1,0> rtVal, A0 arg0) {
    return arg0 ;
  }

  template <typename A0, typename A1>
  static inline A0 at (const Arg<X1,0> rtVal, A0 arg0,
                       A1 arg1) {
    return arg0 ;
  }
} ; 

// compile($1, x0, x1, ...) => x1
// ...


// compile(E1 + E2, ...) => 
//   compile(E1, ...) + compile(E2, ...)
template <typename E1, typename E2> 
struct Inline <Sum<E1,E2> > {
  typedef Sum<E1,E2> Base ;
  typename Infer<Base>::type typedef T ;

  static inline T at (const Sum<E1,E2> rtVal) {
    return Inline<E1>::at(rtVal.lhs) +
      Inline<E2>::at(rtVal.rhs) ;	
  } ;

  template <typename A0>
  static inline T at (const Sum<E1,E2> rtVal, A0 arg0) {
    return Inline<E1>::at(rtVal.lhs, arg0) +
           Inline<E2>::at(rtVal.rhs, arg0) ;	
  } ;

  template <typename A0, typename A1>
  static inline T at (const Sum<E1,E2> rtVal, A0 arg0,
                      A1 arg1) {
    return Inline<E1>::at(rtVal.lhs, arg0, arg1) +
           Inline<E2>::at(rtVal.rhs, arg0, arg1) ;	
  } ;
} ;

The rules for Prod and Pow look much like those for Sum.

In action

Here's the code to INTEGRATE, as motivated by the introduction:

template < typename F >
static inline 
double INTEGRATE (F f, double a, double b, int n) {
  double delta = (b - a) / n ;
  double area = 0.0 ;
  double x = a ;
  for (int i = 0; i < n; ++i, x += delta) {
    double y = f(x) ; 
    area += (y * delta) ;
  }
  return area ;
}

This function assumes that a static closure will be passed in as the first argument (or some other function that overloads apply).

And, the static closure will unfold itself at the point of application: f(x).

Here are a couple more examples of how static closures can be used:

 double j = ($1i + 10 + $0d)(39.0, 50) ;

 Arg<double,0> x ;
 double m = 0.37 ;
 double b = 0.41 ;

 double A = INTEGRATE(x*m + b, 0.0, 3.0, 2000) ; 

Source code

The full source code, with examples, is available: SC.cpp.

More resources