Self-inlining anonymous functions in C++

[article index] [] [@mattmight] [rss]

In my last post, I explored a domain-specific embedded language for lambda-style anonymous functions in C++. Many liked the idea, but some quibbled about the potential run-time cost. Personally, I didn't think it was a large overhead, but some programmers aren't satisfied until the compiler outputs the assembly they would have written themselves.

To show that the overhead can be eliminated, I've created a second demo of anonymous functions. In this implementation, anonymous functions without free variables automatically inline themselves at their point of use, eliminating all run-time overhead.

For example, suppose you wanted to numerically integrate several functions repeatedly over different ranges. A sensible abstraction might involve an integration function that takes in a function pointer, a range and a number of partitions; for example:

 double integrate (double (*f)(double), double a, double b, int n) ;
where the function body contains a simple for loop.

Of course, this approach has a drawback: in every iteration of the loop, there will be a function call, and the function to be integrated had to be defined somewhere at the top-level.

With self-inlining anonymous functions, we can make it so that the expression

 double area = integrate(lambda(x, x * x), 0.0, 1.0, 10000) ;
compiles to the same code as
  double area = 0.0 ;
  double x = 0.0 ;
  for (int i = 0; i < 10000; ++i, x += 0.0001) {
   area += ((x * x) * delta) ;
  }
thereby eliminating the outer function call to integrate and the function call within the loop as well.

The (proof-of-concept) demonstration discussed below abuses nested template member functions, recursive templates and template meta-evaluation to pull this off.

More resources

Short version

Each lambda term folds its definition into its type, e.g, lambda (x, x * x) has the type Sum < Arg<A>, Arg<A> >, where A is the type of the meta-variable x. Thus, everwhere the lambda term is applied, there is enough information for a template meta-program to unfold the type back into the body of the function.

Breaking it down

To understand how this works, let's break down an example:

 integrate(lambda(x, x * x), 0.0, 1.0, 10000) ;

Passing anonymous functions as types

The prototype for the integrate function is:

template <typename F>
static inline
double integrate (F f, double a, double b, int n) ;

At this point, an experienced C++ progarmmer might be puzzled: the type of the function parameter (F) seems to be unrestricted. In practice, the type we pass for this parameter will encode all of the information necessary to reconstruct the function we wish to invoke. Since templates are expanded lazily, if we pass a type that doesn't encode a function, we'll get a compile-time error when it tries to inline it.

Calling anonymous functions

Peeking inside the function integrate, we get a glimpse at how this is done; instead of a call to f(x), we see:

 ... Inline<F>::at(x) ...
In fact, the compiler warns that the run-time parameter f isn't even used! (This is a good indication that whatever is happening, it's happening at compile-time.)

So, clearly, Inline<F> is a template meta-program that reconstructs an expression equivalent to the function described by F, and the at static member function inlines this expression, with its argument in place of the function's argument.

Constructing anonymous functions

To construct an anonymous function, we need to encode its definition as a type. In compiler theory, it's standard to encode an expression as a syntax tree. So, we just need a way to encode syntax trees as types.

For now, we'll assume all anonymous functions take only one argument (this method generalizes to multiple arguments), and that it contains no free variables (this method also generalizes to free variables).

We'll use the base class Exp<T> to represent an expression with a syntax tree encoded as type T:

template <typename T>
struct Exp {
  template <typename M>
  Prod<T,M> operator * (M m) ;

  template <typename M>
  Sum<T,M> operator + (M m) ;
} ;

Note how the nested template member operator overloads for addition and multiplication preserve the structure of the computation in the resulting type. So, if we have two expressions e1 and e2 whose types represent compile-time expressions, then the type of e1 + e1 represents a compile-time expression that computes their sum.

For our very simple demonstration language, we'll restrict the domain-specific embedded language that describes anonymous functions to just arguments, sums, products and constants. If you want do anything serious with this technique, you'll probably want to enrich this language with more expression types first:

// Integer constant:
template <int i>
struct I : public Exp<I<i> > {
  static I<i> v ;
} ;

//  Function argument of type T:
template <typename T>
struct Arg : public Exp< Arg<T> > {} ;

// Sum of two expressions:
template <typename R1, typename R2> 
struct Sum : public Exp<Sum<R1,R2> > {} ;

// Product of two expressions:
template <typename R1, typename R2> 
struct Prod : public Exp<Prod<R1,R2> > {} ;


Clearly, we don't need to define any member functions, since we'll never actually be using instances of these types. These types are compile-time data structures for a template meta-program. Note that the base class of these expressions contains their own type as an argument. This technique is what lets the overloaded operations in the base class work.

Putting it all together, we can create a compile-time anonymous function that squares its argument like so:

 Arg<double> x ; // x is now a meta-parameter.
 ... lambda(x, x * x) ...
  // has type Prod<Arg<double>,Arg<double> >
and, the lambda function is just a syntactic placebo; for single-argument anonymous functions, its role is aesthetic:

template <typename T, typename A>
static inline T lambda (A arg, T body) {
  (void) arg ;
  (void) body ;
  return Dummy<T>::value ;
}

(If you allowed for multi-argument anonymous functions, the lambda would actually be necessary.)

Inlining anonymous functions

To unfold the definition of the function at its point of use requires a very basic C++ template meta-program. C++ template meta-programs are, for the most part, just purely functional pattern-matching and term-rewriting systems.

Assuming that each compile-time anonymous function F is invoked as Inline<F>::at(x), the following template meta-program inlines the function encoded in type F and then applies it to x;

template <typename E>
struct Inline {
} ;

// Inline constants:
template < int i >
struct Inline< I<i> > {
  static inline int at (int arg) {
    (void)arg ;
    return i ;
  } 
} ;

// Inline arguments:
template < typename X1 >
struct Inline<Arg<X1> > {
  static inline X1 at (X1 arg) {
    return arg ;
  } 
} ;

// Inline sums:
template < typename E1, typename E2 >
struct Inline<Sum<E1,E2> > {

  template <typename A>
  static inline A at (A arg) {
    return Inline<E1>::at(arg) +
           Inline<E2>::at(arg) ;
  }

} ;

// Inline products:
template < typename E1, typename E2 >
struct Inline<Prod<E1,E2> > {

  template <typename A>
  static inline A at (A arg) {
    return Inline<E1>::at(arg) *
           Inline<E2>::at(arg) ;
  }
} ;

Compiler notes

A key assumption in this demonstration is that the C++ compiler will obey the inline keyword. Of course, C++ compilers are not required to do this, so it's important to understand when they will. In my own tests, g++ started inlining functions marked static inline at optimization levels -O2 and higher.

Glancing at the the x86 code generated for

  double area = integrate(lambda(x, x * x), 0.0, 1.0, 10000) ;
  printf("integrate: %f\n", area) ; 
at -O2, I found this sitting inside main:
L11:
        movapd  %xmm1, %xmm0
        mulsd   %xmm1, %xmm0
        mulsd   %xmm2, %xmm0
        addsd   %xmm0, %xmm3
        incl    %eax
        addsd   %xmm2, %xmm1
        cmpl    $10000, %eax
        jne     L11
        movapd  %xmm3, %xmm0
        leaq    LC2(%rip), %rdi
        movl    $1, %eax
        call    _printf
As you can see, the only call is to printf, and the loop got inlined in main.

Code

Download the full demo code.

As an exercise, you could extend it with a richer set of expressions in the DSEL. A more exciting upgrade would be to thread the values of free variables through the run-time value of the anonymous function; at present, this is just is a dummy-value that gets eliminated by the optimizer. Then, you'd have self-inlining anonymous closures!