# Equational derivations of the Y combinator and Church encodings in Python

I love the Y combinator and Church encodings.

Every time I explain them, I feel like I’m using sorcery.

I’ve written posts on memoizing recursive functions with the Y combinator in JavaScript and on the Church encodings in Scheme and in JavaScript.

When I spoke at Hacker School, I used Python as the setting in which to derive Church encodings and the Y combinator for the first time.

In the process, Python seemed to hit a sweet spot for the explanation: it’s a popular language, and the syntax for lambda is concise and close to the original mathematics.

I’m distilling the technical parts of that lecture into this post, and in contrast to prior posts, I’m taking a purely equational reasoning route to Church encodings and the Y combinator – all within Python.

In the end, we’ll have constructed a programming language out of the lambda calculus, and we’ll arrive at the factorial of 5 in the lambda calculus, as embedded in Python:

(((lambda f: (((f)((lambda f: ((lambda z: (((f)(((f)(((f)(((f)(((f)
(z)))))))))))))))))))((((((lambda y: ((lambda F: (((F)((lambda x:
(((((((y)(y)))(F)))(x)))))))))))((lambda y: ((lambda F: (((F)((lambda x:
(((((((y)(y)))(F)))(x)))))))))))))((lambda f: ((lambda n: ((((((((((((
lambda n: (((((n)((lambda _: ((lambda t: ((lambda f: (((f)((lambda void:
(void)))))))))))))((lambda t: ((lambda f: (((t)((lambda void: (void)))))
))))))))((((((lambda n: ((lambda m: (((((m)((lambda n: ((lambda f:
((lambda z: (((((((n) ((lambda g: ((lambda h: (((h)(((g)(f)))))))))))
((lambda u: (z)))))((lambda u: (u)))))))))))))(n))))))) (n)))((lambda f:
((lambda z: (z)))))))))((lambda _: ((((lambda n: (((((n) ((lambda _: ((
lambda t: ((lambda f: (((f)((lambda void: (void))))))))))))) ((lambda t:
((lambda f: (((t)((lambda void: (void))))))))))))) ((((((lambda n:
((lambda m: (((((m)((lambda n: ((lambda f: ((lambda z: (((((((n) ((lambda
g: ((lambda h: (((h)(((g)(f)))))))))))((lambda u: (z)))))((lambda u:
(u)))))))))))))(n)))))))((lambda f: ((lambda z: (z)))))))(n)))))))))
((lambda _: ((lambda t: ((lambda f: (((f)((lambda void: (void)))))))))))
))((lambda _: ((lambda f: ((lambda z: (((f)(z)))))))))))((lambda _: (((
(((lambda n: ((lambda m: ((lambda f: ((lambda z: (((((m)(((n)(f)))))(z)
))))))))))(n)))(((f) ((((((lambda n: ((lambda m: (((((m)((lambda n:
((lambda f: ((lambda z: (((((((n) ((lambda g: ((lambda h: (((h)(((g)(f)
))))))))))((lambda u: (z)))))((lambda u: (u)))))))))))))(n)))))))(n)))
((lambda f: ((lambda z: (((f) (z))))))))))))))))))))))))(lambda x:x+1)(0)


Run the above in your Python interpreter. It’s equal to 120.

As a bonus, this post is a proof that the indentation-sensitive constructs in Python are strictly optional.

## What is lambda?

Lambda is a notation for expressing anonymous functions.

For instance, in Python, we define the function f using def:

def f(x):
return x + 1

print (f(10))    # prints 11


Or, we can assign f to lambda x: x + 1:

f = lambda x: x+1

print (f(10))   # prints 11


## The lambda calculus

Alonzo Church discovered the lambda calculus back in the 1920s, not as a programming language, but in an effort to find a functional foundation for logic.

The lambda calculus is what’s left of mathematics if you take away everything except for variables, anonymous functions (lambdas) and function application.

It’s also the world’s smallest universal programming language, and it lives inside many major programming languages these days, including Python, JavaScript, Ruby, PHP, Java and C++. (Of course, it also lives in the functional languages.)

The following grammar carves out a subset of Python that encodes the lambda calculus:

 <exp> ::= <var>
|  <exp>(<exp>)
|  lambda <var>: <exp>
|  (<exp>)

<var> is a legal variable name


In the lambda calculus, all values are functions.

## A programming language out of lambda

At first glance, the lambda calculus doesn’t appear to be a universal foundation for programming.

To convert it into a universal foundation, we will steadily construct the features expected of a modern programming language:

• void values;
• multi-argument functions;
• booleans and conditionals;
• numbers and arithmetic;
• pairs;
• lists; and
• recursive functions.

To do this, we’ll employ lambda-based programming techniques such as Currying, Church encodings, self-application, eta-expansion and the fixed-point combinators.

## A void value

Sometimes, we’ll need a “void” value that we can pass when we expect that argument to be ignored.

We’ll the identity function to accomplish this;

VOID = lambda void: void


If we wanted ‘debugging’ support for this development, you might want to temporarily use an alternate definition of VOID:

def VOID(_):
raise Exception('Cannot invoke VOID!')


In this case, if someone accidentially invokes VOID, it will blow up.

## Multi-argument functions

The lambda calculus requires that every function have exactly one argument.

It’s easy to see how to fake zero-argument functions: we can take every call with zero arguments:

f()


and convert it into a call that takes VOID:

f(VOID)


so that every lambda taking zero arguments:

lambda : body


becomes a lambda with an ignored argument:

lambda _ : body


If we want true multi-argument functions, we need to use Currying.

In the case of a function over two arguments, the Curried function becomes a function that accepts the first argument, but then returns a function accepting the second argument and returning the result.

For example, suppose we have a function sum:

sum = lambda x, y: x + y

print (sum(2,3))     # prints 5


we could Curry it to produce a function that completes the sum:

sum = lambda x: lambda y: x + y

print (sum(2)(3))    # prints 5


At this point, we can desugar multi-argument functions into strictly single-argument functions.

## Conditionals and Booleans

To develop conditionals, we want to create a function that simulates the behavior of the classic McCarthy-style if <condition> then <exp> else <exp> construct.

In C-like languages, this construct is the ternary operator:

  <condition> ? <exp> : <exp>


and in Python proper, this is

  <exp> if <condition> else <exp>


To start, we’re going to constrain the function IF and the values TRUE and FALSE so that:

IF (TRUE)  (true_value) (false_value) == true_value
IF (FALSE) (true_value) (false_value) == false_value


There are several ways to do this, but we can take a step in the right direction by remembering that all values in the lambda calculus are functions – including whatever TRUE and FALSE must be.

Then, we can constrain IF to be the function that applies the condition to its arguments:

IF (cond) (true_value) (false_value) == cond(true_value)(false_value)


which leads to a pure-lambda definition of IF:

IF = (lambda cond: lambda true_value: lambda false_value:
cond(true_value)(false_value))


At this point, we can equationally solve for the definition of TRUE and FALSE:

    TRUE (true_value) (false_value)
== IF (TRUE) (true_value) (false_value)
== true_value


And, as before, we can turn this into a valid Python definition by moving the arguments with lambdas:

TRUE = lambda true_value: lambda false_value: true_value


Following the same process for FALSE yields:

FALSE = lambda true_value: lambda false_value: false_value


### Making it work with eager languages

This encoding for TRUE and FALSE works when the computations for the true branch and the false branch are terminating and error-free.

But, if we tried IF (TRUE) (value) (nonterminate()) where nonterminate() never completes, then the result will not be equal to value, because the entire expression will never terminate.

To fix this, we must “thunk” the arguments to IF by wrapping them in lambdas to suspend their computation:

IF (TRUE)  (lambda: true_value)  (lambda: false_value) == true_value
IF (FALSE) (lambda: false_value) (lambda: false_value) == false_value


The definition of IF stays the same, but resolving for TRUE and FALSE yields:

TRUE  = lambda true_value: lambda false_value: true_value()
FALSE = lambda true_value: lambda false_value: false_value()


## Numbers

Church numerals are a means for encoding the natural numbers as functions.

In the spirit of encoding data as its use, Church numerals encode natural numbers as iterated application.

For example, the Church numeral for 3 is a function that applies a composes its argument with itself 3 times.

That means that ZERO, given any function, returns the identity function:

ZERO (f) (z) == z


so that:

ZERO = lambda f: lambda z: z


The Church numeral for ONE applies its argument once; TWO applies it twice; THREE applies it three times:

ONE   (f) (z) == f(z)
TWO   (f) (z) == f(f(z))
THREE (f) (z) == f(f(f(z)))


so that:

ONE   = lambda f: lambda z: f(z)
TWO   = lambda f: lambda z: f(f(z))
THREE = lambda f: lambda z: f(f(f(z)))


Alternatively, if we define function composition, compose:

compose = lambda f,g: lambda x: f(g(x))


we can represent these numbers in a point-free style:

ONE   = lambda f: f
TWO   = lambda f: compose(f,f)
THREE = lambda f: compose(f,compose(f,f))


We can write a function in Python that turns a Python number into a Church numeral:

def numeral(n):
return lambda f: lambda z: z if n == 0 else f(numeral(n-1)(f)(z))

print (numeral(0)(lambda x: x+1)(0))   # prints 0
print (numeral(7)(lambda x: x+1)(0))   # prints 7
print (numeral(3)(lambda x: 2*x)(1))   # prints 8


and a function that turns a Church numeral back into a number:

natify = lambda c: c(lambda x: x+1)(0)

print (natify (ONE))  # prints 1
print (natify (TWO))  # prints 2


### Successors

Given the representation of a Church numeral, $n$, its successor, $n+1$ should apply f one more time:

SUCC (n) (f) (z) == f(n(f)(z))


By pulling the arguments across with lambda, we get a definition for SUCC:

SUCC = lambda n: lambda f: lambda z: f(n(f)(z))


And, it works:

print (SUCC (ONE) (lambda x: x+1) (0)) # prints 2


To add $n$ and $m$ as Church numerals, we want to apply a function $n$ times and then apply it $m$ times.

So, we can define addition using composition:

SUM (n) (m) (f) (z) == compose(n(f),m(f))(z)


so that:

SUM (n) (m) (f) == compose(n(f),m(f))


and using lambdas to peel off the arguments:

SUM = lambda n: lambda m: lambda f: compose(n(f),m(f))


And, after inlining the compose:

SUM = lambda n: lambda m: lambda f: lambda z: n(f)(m(f)(z))


A few tests show that it works:

FOUR = SUM (TWO) (TWO)
FIVE = SUM (TWO) (THREE)

print (natify (FOUR))  # prints 4
print (natify (FIVE))  # prints 5


### Multiplication

To multiply $n$ and $m$ as Church numerals, we want to apply the encoding of $n$ exactly $m$ times:

MUL (n) (m) (f) (z) == m(n(f))(z)


so that peeling the arguments off with lambdas yields:

MUL = lambda n: lambda m: lambda f: lambda z: m(n(f))(z)


And, we can even derive a point-free version with compose:

MUL = lambda n: lambda m: lambda f: lambda z: m(n(f))(z)
= lambda n: lambda m: lambda f: m(n(f))
= lambda n: lambda m: lambda f: compose(m,n)(f)
= lambda n: lambda m: compose(m,n)


And, it works:

FOUR = MUL (TWO) (TWO)
SIX  = MUL (TWO) (THREE)

print (natify (FOUR))  # prints 4
print (natify (SIX))   # prints 6


## Pairs

To create pairs (and tuples), we want a constructor that can create a pair (PAIR) and accessors to pull components out (LEFT, RIGHT).

Together, they need to obey the following properties:

LEFT  (PAIR (a) (b)) == a
RIGHT (PAIR (a) (b)) == b


Let’s assume that a pair is a function that applies its first argument to both the left and the right value:

PAIR (a) (b) == lambda f: f(a)(b)


so that:

PAIR = lambda a: lambda b: lambda f: f(a)(b)


Then we can solve for LEFT:

    a
== LEFT (PAIR (a) (b))
== LEFT (lambda f: f(a)(b))


Assuming LEFT (p) == p(g) for some unknown function g lets us solve for g:

    LEFT (lambda f: f(a)(b))
== (lambda f: f(a)(b))(g)
== g(a)(b)


All together, a == g(a)(b), so we can solve for g by pulling the arguments across with lambda:

g = lambda a: lambda b: a


which means that:

LEFT = lambda p: p(lambda a: lambda b: a)


Repeating the process for RIGHT yields:

RIGHT = lambda p: p(lambda a: lambda b: b)


## Lists

To create lists, we need a NIL element, a list constructor CONS, a predicate for detecting empty lists NILP, a HEAD accessor and a TAIL accessor, so that all together, the following constraints hold:

NILP (NIL) == TRUE
NILP (CONS (hd) (tl)) == FALSE

HEAD (CONS (hd) (tl)) == hd
TAIL (CONS (hd) (tl)) == tl


To pull this off, we will allow a list to take two functions: one to call when the list is empty and the other to call when the list is non-empty.

Under this interpretation, NIL should call the empty function:

NIL = lambda onnil: lambda onlist: onnil()


And, the CONS constructor should call the non-empty function with both the head and the tail:

CONS (hd) (tl) == lambda onnil: lambda onlist: onlist(hd)(tl)


so that:

CONS = (lambda hd: lambda tl:
lambda onnil: lambda onlist: onlist(hd)(tl))


Under this convention, NILP can engineer the functions it passes in to return the appropriate value:

NILP (list) == list (lambda: TRUE) (lambda hd: lambda tl: FALSE)


so that:

NILP = lambda list: list (lambda: TRUE) (lambda hd: lambda tl: FALSE)


And, at the same time HEAD and TAIL can extract the appropriate value:

HEAD = lambda list: list (VOID) (lambda hd: lambda tl: hd)
TAIL = lambda list: list (VOID) (lambda hd: lambda tl: tl)


## Non-termination via the U Combinator

The first hint that the lambda calculus may be capable of general purpose computation comes from the U Combinator.

The U combinator is the function that applies its argument to itself:

U = lambda f: f(f)


When we apply the U Combinator to itself, something strange happens:

U(U)  # error: stack overflow!


Because Python lacks tail-call optimization, it blows out the call stack.

In languages with correct tail-call handling, it non-terminates.

In the context of U(U), the expression f ends up being bound to lambda f: f(f).

Through self-application, we achieved self-reference.

## Recursion via the U Combinator

Once non-termination enters the picture, it’s natural to wonder if one might do more than just non-terminate.

Can we achieve recursion through self-application?

We can.

Consider the recursive definition of factorial:

fact = lambda n: 1 if n <= 0 else n*fact(n-1)


We can try to pass a copy of “factorial” into itself:

fact = ((lambda n: 1 if n <= 0 else n*fact(n-1))
(lambda n: 1 if n <= 0 else n*fact(n-1)))


But, this doesn’t work: n ends up bound to (lambda n: 1 if n <= 0 else n*fact(n-1)), and the program breaks.

But, what if we add an extra parameter to represent the function itself?

We end up with:

fact = ((lambda f: lambda n: 1 if n <= 0 else n*fact(n-1))
(lambda f: lambda n: 1 if n <= 0 else n*fact(n-1)))


Now, it works, but we still have a recursive reference to fact.

Fortunately, f(f) (or U(f)) will produce a new reference to fact.

So, we can use:

fact = ((lambda f: lambda n: 1 if n <= 0 else n*(U(f))(n-1))
(lambda f: lambda n: 1 if n <= 0 else n*(U(f))(n-1)))


and clean the whole expression up with a call to the U combinator:

fact = U(lambda f: lambda n: 1 if n <= 0 else n*(U(f))(n-1))


Sure enough, it works:

U(lambda f: lambda n: 1 if n <= 0 else n*(U(f))(n-1))(5) # prints 120


Booleans. Numbers. Pairs. Lists. Recursion.

None are fundamental.

If we have anonymous functions, we have all of these.

## Expressing recursion via the Y Combinator

The U Combinator may be sufficient to demonstrate the universality of the lambda calculus, but for those that wish to use the lambda calculus as an everyday programming language, it is cumbersome.

The Y Combinator is an elegant way to express recursive functions.

To express recursion via the Y Combinator requires two steps:

1. express a recursive function as the fixed point of a non-recursive function; and

2. create a function – the Y Combinator – to find the fixed point of another function, without using recursion.

The fixed point of a function $f$ is the value $x$ such that $x = f(x)$.

A functional is a function that takes a function as its argument.

The fixed point of a functional (if it exists), by definition, would have to be a function.

To put this in more concrete terms, we can work on a specific example:

What is the functional for which factorial is a fixed point?

Let’s cheat at first, and assume we already have a recursive defintion of factorial available:

fact = lambda n: 1 if n <= 0 else n*fact(n-1)


A simple functional that returns factorial when given factorial is:

lambda f:fact


We can inline fact once to arrive at:

lambda f: lambda n: 1 if n <= 0 else n*fact(n-1)


Given factorial, the above will return factorial.

Under the assumption that factorial will be passed in for f, we can confidently state that the fixed point of the following functional is factorial:

lambda f: lambda n: 1 if n <= 0 else n*f(n-1)


What we need now is a function Y that will find the fixed point of a functional for us, so that:

fact = Y(lambda f: lambda n: 1 if n <= 0 else n*f(n-1))


## Deriving the Y Combinator

To derive the Y Combinator, let us first state the key property we seek; Y(F) should return the fixed point of F:

Y(F) == f   and   f == F(f)


Substituting F(f) for f gives:

Y(F) == F(f)


Because f == Y(F), we can substiute for f yet again to yield:

Y(F) == F(Y(F))


We can pull the argument off with lambda to get an operational definition:

Y = lambda F: F(Y(F))


Of course, this won’t work:

Y(lambda f: lambda n: 1 if n <= 0 else n*f(n-1))(5) # stack overflow!


The problem is that Y immediately invokes Y recursively.

Fortunately, we can $\eta$-expand the call to Y, under the observation that for any expression in the lambda calculus, e:

e == lambda x: e(x)


except that if e is non-terminating, then the expanded expression will terminate.

This expansion yields a new definition of Y:

Y = lambda F: F(lambda x:Y(F)(x))


Y(lambda f: lambda n: 1 if n <= 0 else n*f(n-1))(5) # prints 120


All that’s left now is to apply the U combinator to eliminate the explicit recursion in the definition of Y:

Y = U(lambda h: lambda F: F(lambda x:U(h)(F)(x)))


And, of course, we can also inline U to leave an expression in the pure lambda calculus:

Y = ((lambda h: lambda F: F(lambda x:h(h)(F)(x)))
(lambda h: lambda F: F(lambda x:h(h)(F)(x))))


Observe:

Y(lambda f: lambda n: 1 if n <= 0 else n*f(n-1))(5) # prints 120


## All together

Putting together all of the Church encodings and the Y Combinator allows the expression of factorial in pure lambda calculus, in which case fact(5) becomes:

(((lambda f: (((f)((lambda f: ((lambda z: (((f)(((f)(((f)(((f)(((f)
(z)))))))))))))))))))((((((lambda y: ((lambda F: (((F)((lambda x:
(((((((y)(y)))(F)))(x)))))))))))((lambda y: ((lambda F: (((F)((lambda x:
(((((((y)(y)))(F)))(x)))))))))))))((lambda f: ((lambda n: ((((((((((((
lambda n: (((((n)((lambda _: ((lambda t: ((lambda f: (((f)((lambda void:
(void)))))))))))))((lambda t: ((lambda f: (((t)((lambda void: (void)))))
))))))))((((((lambda n: ((lambda m: (((((m)((lambda n: ((lambda f:
((lambda z: (((((((n) ((lambda g: ((lambda h: (((h)(((g)(f)))))))))))
((lambda u: (z)))))((lambda u: (u)))))))))))))(n))))))) (n)))((lambda f:
((lambda z: (z)))))))))((lambda _: ((((lambda n: (((((n) ((lambda _: ((
lambda t: ((lambda f: (((f)((lambda void: (void))))))))))))) ((lambda t:
((lambda f: (((t)((lambda void: (void))))))))))))) ((((((lambda n:
((lambda m: (((((m)((lambda n: ((lambda f: ((lambda z: (((((((n) ((lambda
g: ((lambda h: (((h)(((g)(f)))))))))))((lambda u: (z)))))((lambda u:
(u)))))))))))))(n)))))))((lambda f: ((lambda z: (z)))))))(n)))))))))
((lambda _: ((lambda t: ((lambda f: (((f)((lambda void: (void)))))))))))
))((lambda _: ((lambda f: ((lambda z: (((f)(z)))))))))))((lambda _: (((
(((lambda n: ((lambda m: ((lambda f: ((lambda z: (((((m)(((n)(f)))))(z)
))))))))))(n)))(((f) ((((((lambda n: ((lambda m: (((((m)((lambda n:
((lambda f: ((lambda z: (((((((n) ((lambda g: ((lambda h: (((h)(((g)(f)
))))))))))((lambda u: (z)))))((lambda u: (u)))))))))))))(n)))))))(n)))
((lambda f: ((lambda z: (((f) (z))))))))))))))))))))))))(lambda x:x+1)(0)


## Code

A condensed version of the code in this post is available.

1. Write PRED` to return the predecessor of a Church numeral.