An introduction to QuickCheck by example: Number theory and Okasaki's red-black trees

[article index] [] [@mattmight] [rss]

Randomized testing exemplifies the 80/20 rule: it yields 80% of the benefit of formal verification for 20% of the effort.

(For a deep treatment of randomized testing, I highly encourage you to read my collegue John Regehr’s blog.)

Koen Claessen and John Hughes’s QuickCheck is a powerful randomized testing framework for Haskell, and it is an excellent argument in favor of using the language (among many other good arguments).

This post gently introduces QuickCheck to test number-theoretic conjectures, including the falsificaton of a conjecture about primes from antiquity that was once open for millenia.

This post then reconstructs, step by step, my use of QuickCheck to debug purely functional red-black trees.

Warning: Basic Haskell assumed

This post should be accessible to most functional programmers, but some knowledge of Haskell is assumed.

If you’re not familiar with Haskell, I recommend Real World Haskell:

Intro: QuickCheck with number theory

With QuickCheck, programmers can write specifications as Haskell code, and QuickCheck will attempt to falsify them by generating random inputs.

Consider a very simple property from number theory:

“For each integer \(n\), if \(n\) is even, then \(n+1\) is odd.”

Or, more formally:

\( \forall n . \mathit{even}(n) \to \mathit{odd}(n+1) \)

We can validate (not verify) this property using QuickCheck by asking it to generate random intgers and test the property.

After importing Test.QuickCheck, it’s a one-liner:

> quickCheck(\n -> even(n) ==> odd(n+1))
+++ OK, passed 100 tests.

Haskell inferred that the type of n is Int.

It then invoked the function on 100 random integers, ensuring that each invocation satisfied the test.

QuickCheck versus Mersenne conjecture of antiquity

The \(n\text{th}\) Mersenne number is given by the formula \(M_n = 2^n-1\).

The ancients were able to calculate Mersenne’s large enough to find that \(M_2\), \(M_3\), \(M_5\) and \(M_7\) were prime numbers.

Since 2, 3, 5 and 7 are primes, it seemed reasonable that for any prime \(p\), \(M_p\) must be prime:

\( \mathbf{Conjecture}: \forall n . \mathit{prime}(n) \to \mathit{prime}(2^n-1) \)

But, what does QuickCheck think of this?

First, we need to encode isPrime :: Integer -> Bool:

m `divides` n = n `mod` m == 0

isPrime :: Integer -> Bool
isPrime n | n <= 1 = False
isPrime 2 = True
isPrime n = not $ any (`divides` n) [2..n-1]

And, then we can ask QuickCheck:

> quickCheck(\n -> isPrime n ==> isPrime(2^n - 1))
*** Failed! Falsifiable (after 14 tests):                  

After 14 guesses and fractions of a second, QuickCheck found a counter-example to the ancient conjecture: 11.

It took humanity about two thousand years to do the same.

(For larger numbers, an efficient isPrime can be defined with methods from my post on primality testing.)

QuickCheck versus the Collatz conjecture

The Collatz conjecture is easy to state informally:

Given a natural number, if that number is even, divide it by two; otherwise, multiply it by 3 and add 1. The Collatz conjecture states that repeating this process eventually reaches 1.

Formally, we can define an indivdual step of Collatz as the function \(f\):

\( f(n) = n/2 \text{ if } n \text{ is even} \)

\( f(n) = 3n+1 \text{ if } n \text{ is odd} \)

Let the proposition \(\mathit{Collatz}(n)\) be true if and only if the Collatz conjecture is true for \(n\). Clearly, \(\mathit{Collatz}(1)\) is true.

By definition, it must be that: \( \mathit{Collatz}(n) \iff \mathit{Collatz}(f(n)) \)

Putting this all together, we can embed Collatz in Haskell:

f :: Integer -> Integer
f(n) | even(n) = n `div` 2
     | odd(n)  = 3*n + 1

collatz :: Integer -> Bool
collatz(1) = True
collatz(n) = collatz(f(n))

And, we can test it with QuickCheck:

> quickCheck (collatz)
[non-termination] [ctrl-c]
*** Failed! Exception: 'user interrupt' (after 1 test): 

Right off the bat, QuickCheck finds a non-terminating input: 0.

We need to insert a guard to prevent QuickCheck from diverging:

> quickCheck (\n -> n > 0 ==> collatz(n))
+++ OK, passed 100 tests.

If you want to see which values QuickCheck tried, you can use verbose:

> (quickCheck . verbose) (\n -> n > 0 ==> collatz(n))
[prints all test cases]
+++ OK, passed 100 tests.

Running this shows that QuickCheck is discarding nonpositive test cases.

Instead of allowing QuickCheck to generate test cases that are never used, we can instead write a generator, positives, that only yields random positive values:

positives :: Gen Integer
positives = 
 do -- Pick an arbitrary integer:
    x <- arbitrary 
    -- Make it positive, if necessary:
    if (x == 0) 
      then return 1
    else if (x < 0)
      then return (-x)
      return x

Gen is a monadic kind for describing generators of random values.

arbitrary is a member of the Arbitrary typeclass, and it yields a generator for the parameter:

class Arbitrary a where
  arbitrary :: Gen a

QuickCheck instantiates Arbitrary on many of the basic types like Integer.

The forAll quantifier will draw test cases only from a specified generator:

> verboseCheck(forAll positives collatz)
[shows only positive test cases]
+++ OK, passed 100 tests.

QuickCheck versus integer factorization

We can encode the notProduct relation (which says than some number is not the product of two others) to hijack QuickCheck for (rather inefficient) prime factorization:

notProduct :: Int -> Int -> Int -> Bool
notProduct n p q = n /= p * q

Running QuickCheck until it fails yields factors of 10.

> quickCheck(notProduct 10)
+++ OK, passed 100 tests.

> quickCheck(notProduct 10)
+++ OK, passed 100 tests.

> quickCheck(notProduct 10)
+++ OK, passed 100 tests.

> quickCheck(notProduct 10)
*** Failed! Falsifiable (after 12 tests):                  

QuickChecking red-black tree deletion

I recently re-implemented a deletion algorithm that I devised for Okasaki’s red-black trees in Haskell. (The original implementation was in Racket.)

The algorithm seems simple, but only because it relies on deductively eliminating the possibility of many cases from occurring.

QuickCheck found two small bugs in my Haskell port of the algorithm.

As an introduction to QuickCheck, I’ll retrace the process of using QuickCheck to hunt down these bugs.

The most time-consuming part of the process is creating generators for random but valid red-black trees.

The Haskell code is available:

A type for red-black trees

The datatype definitions for red-black trees are straightforward:

data Color = 
   R  -- red
 | B  -- black
 | BB -- double black
 | NB -- negative black
 deriving (Show)

data RBSet a =
   E  -- black leaf
 | EE -- double black leaf
 | T Color (RBSet a) a (RBSet a)
 deriving (Show)

The deletion algorithm uses both double black and negative black as extra colors during deletion.

Trees always return to true red-black trees once deletion completes.

A naive generator for trees

In order to QuickCheck routines like insertion or deletion, we need a way of generating arbitrary trees.

To instantiate the typeclass Arbitrary on type t, the instance needs to provide a generator of type Gen t for the method arbitrary.

QuickCheck defines combinators like oneof that make it straightforward to build a simple generator for red-black trees:

unboundedTree :: Arbitrary a => Gen (RBSet a)
unboundedTree = 
 oneof [return E, 
        liftM4 T (oneof [return R,return B])
        unboundedTree arbitrary unboundedTree]

A generator for bounded trees

Because oneof chooses among its branches with equal probability, there’s a chance that unboundedTree will create very large trees.

Moreover, a good generator should respect the test size requested by QuickCheck, since QuickCheck will start with small test cases and grow to larger ones.

The method sized :: (Int -> Gen a) -> Gen a allows a generator to be parameterized by the requested test size.

With minor modifications, we can construct a tree whose height is bounded by the requested size:

boundedTree :: Arbitrary a => Gen (RBSet a)
boundedTree = sized tree where

 tree :: Arbitrary a => Int -> Gen (RBSet a)
 tree 0 = return E
 tree n | n>0 = 
   oneof [return E,
          liftM4 T color subtree arbitrary subtree]
   where subtree = tree (n `div` 2)
         color = oneof [return R, return B]

A property for checking red-red violations

To generate proper red-black trees, we’ll need to obey coloring properties, balancing properties and ordering properties.

The coloring property is that a red node cannot have a red child:

prop_NoRedRed :: RBSet Int -> Bool
prop_NoRedRed E = True
prop_NoRedRed (T R (T R _ _ _) _ _) = False
prop_NoRedRed (T R _ _ (T R _ _ _)) = False
prop_NoRedRed (T _ l x r) = (prop_NoRedRed l) && (prop_NoRedRed r)

Of course, if we QuickCheck this property, it will fail:

> quickCheck (forAll boundedTree prop_NoRedRed)
*** Failed! Falsifiable (after 6 tests):  
T R (T R (T B E 1 E) (-4) E) (-3) E

A generator that avoids red-red violations

To avoid red-red violations, we’ll add the color of the parent to the generator, so that it won’t generate a red when the parent is red:

nrrTree :: Arbitrary a => Gen (RBSet a)
nrrTree = sized (tree R) where

 tree :: Arbitrary a => Color -> Int -> Gen (RBSet a)

  -- Assuming black parent:
 tree B 0 = return E
 tree B n | n>0 = 
   oneof [return E,
          liftM4 T (return B) subtree  arbitrary subtree,
          liftM4 T (return R) subtree' arbitrary subtree']
   where subtree  = tree B (n `div` 2)
         subtree' = tree R (n `div` 2)

  -- Assuming red parent:
 tree R 0 = return E
 tree R n | n>0 = 
   oneof [return E,
          liftM4 T (return B) subtree arbitrary subtree]
   where subtree = tree B (n `div` 2)

and this QuickChecks perfectly:

> quickCheck (forAll nrrTree prop_NoRedRed)
+++ OK, passed 100 tests.

A property for balanced black depth

A separate property can check that the path from the root to every leaf passes through the same number of blacks.

blackDepth :: RBSet a -> Maybe Int
blackDepth (E) = Just(1)
blackDepth (T R l _ r) = case (blackDepth(l),blackDepth(r)) of
  (Just(n),Just(m)) -> if n == m then Just(n) else Nothing
  (_,_) -> Nothing
blackDepth (T B l _ r) = case (blackDepth(l),blackDepth(r)) of
  (Just(n),Just(m)) -> if n == m then Just(1+n) else Nothing
  (_,_) -> Nothing

prop_BlackBalanced :: RBSet Int -> Bool
prop_BlackBalanced t =
 case blackDepth(t) of
  Just _ -> True
  Nothing -> False

But, when we QuickCheck this property, it fails:

> quickCheck (forAll nrrTree prop_BlackBalanced)
*** Failed! Falsifiable (after 5 tests):  
T B E (-4) (T R E (-1) (T B E (-3) E))

Generating black-balanced red-black trees

To generate black-balanced red-black trees without red-red violations, we will re-interpret the size parameter as the black depth of the tree.

This means that we’ll have to take the base–2 logarithm of the intended size to avoid generating exponentially bigger treers during testing:

balnrrTree :: Arbitrary a => Gen (RBSet a)
balnrrTree = sized (\n -> tree B (lg(n))) where

 tree :: Arbitrary a => Color -> Int -> Gen (RBSet a)

 tree B 0 = return E
 tree B 1 = 
   oneof [return E,
          liftM4 T (return R) (return E) arbitrary (return E)]

 tree B n | n>0 = 
   oneof [liftM4 T (return B) subtree arbitrary subtree,
          liftM4 T (return R) subtree' arbitrary subtree']
   where subtree  = tree B (n-1)
         subtree' = tree R n

 tree R 0 = return E
 tree R 1 = return E
 tree R n | n>0 = 
   oneof [liftM4 T (return B) subtree arbitrary subtree]
   where subtree = tree B (n-1)

Now, QuickCheck is happy with both properties:

> quickCheck(forAll balnrrTree (\t -> prop_NoRedRed t &&
                                      prop_BlackBalanced t))
+++ OK, passed 100 tests.

A property for ordered search trees

The ordering of keys in red-black trees must obey the usual ordering found in binary search trees.

We can create a new property to represent this requirement:

prop_OrderedList :: Ord a => [a] -> Bool
prop_OrderedList [] = True
prop_OrderedList [x] = True
prop_OrderedList (x:y:tl) = (x < y) && (prop_OrderedList(y:tl))

prop_Ordered :: RBSet Int -> Bool
prop_Ordered t = prop_OrderedList (toAscList t) 

And, we can create a compound property that combines all three:

prop_RBValid :: RBSet Int -> Bool
prop_RBValid t = prop_NoRedRed t
              && prop_BlackBalanced t
              && prop_Ordered t

Generating ordered, balanced red-black trees

To generate ordered, balanced red-black trees, we will provide a range of key values from which the generator may choose, a technique adapted from Stephanie Weirich and Benjamin Pierce’s course notes:

ordbalnrrTree :: (Arbitrary a, Random a, 
                  Bounded a, Ord a, Num a) => Gen (RBSet a)
ordbalnrrTree =
 sized (\n -> tree 0 100000000000000 B (lg n)) where

 tree min max _ _ | max < min = error "cannot generate"
 tree min max B 0 = return E
 tree min max B 1 =
   oneof [return E,
          liftM4 T (return R) (return E) (choose(min,max)) (return E)]
 tree min max B n | n>0 = 
   do key <- choose (min,max) 
      let subtree1 =  tree min (key-1) B (n-1)
      let subtree2 =  tree (key+1) max B (n-1)
      let subtree1' = tree min (key-1) R n
      let subtree2' = tree (key+1) max R n
      oneof [liftM4 T (return B) subtree1  (return key) subtree2,
             liftM4 T (return R) subtree1' (return key) subtree2']

 tree min max R 0 = return E
 tree min max R 1 = return E
 tree min max R n | n>0 = 
   do key <- choose (min, max)
      let subtree1 = tree min (key-1) B (n-1)
      let subtree2 = tree (key+1) max B (n-1)
      oneof [liftM4 T (return B) subtree1 (return key) subtree2]

If the requested tree is too large, the generator may “choose itself into a corner” (where max < min), so I left the initial range rather large.

In this generator, the monadic do notation allows the chosen key to be inspected and used in the generation of the subtrees.

QuickCheck agrees that these trees are valid:

> quickCheck (forAll ordbalnrrTree prop_RBValid)
+++ OK, passed 100 tests.

Generating trees from insertion

As it turns out, if we trust the insertion operation, then we can use it to create valid trees as well:

insertedTree :: (Arbitrary a, Ord a) => Gen (RBSet a)
insertedTree = liftM (Data.List.foldr insert empty) arbitrary

And, we can have QuickCheck validate them:

> quickCheck(forAll insertedTree prop_RBValid)
+++ OK, passed 100 tests.

Instantiating Arbitrary

We can have arbitrary pick between the two kinds of trees:

instance (Arbitrary a, Random a,
          Bounded a, Ord a, Num a) => Arbitrary (RBSet a) where
  arbitrary = oneof[ordbalnrrTree,
                    liftM (Data.List.foldr insert empty) arbitrary]

Testing insertion

A property can test that insertion creates a valid red-black tree:

prop_InsertValid :: RBSet Int -> Int -> Bool
prop_InsertValid t x = prop_RBValid(insert x t)

A property can test that an inserted value is a member:

prop_InsertMember :: RBSet Int -> Int -> Bool
prop_InsertMember t x = member x (insert x t)

A property can test that insertion doesn’t accidentally remove a member:

prop_InsertSafe :: RBSet Int -> Int -> Int -> Property
prop_InsertSafe t x y = member x t ==> (member x (insert y t))

A property can also check that insertion doesn’t add more than what it was supposed to add:

prop_NoInsertPhantom :: RBSet Int -> Int -> Int -> Property
prop_NoInsertPhantom t x y = 
 not (member x t) && x /= y ==> not (member x (insert y t))

Bugs in deletion

Before running tests for deletion, my deletion code contained errors in these two procedures.

delete :: (Ord a,Show a) => a -> RBSet a -> RBSet a
delete x s = T B a y b
 where del E = E
       del s@(T color a y b) | x < y     = bubble color (del a) y b
                             | x > y     = bubble color a y (del b)
                             | otherwise = remove s
       T _ a y b = del s

remove :: RBSet a -> RBSet a
remove E = E
remove (T R E _ E) = E
remove (T B E _ E) = EE
remove (T B E _ (T R a x b)) = T B a x b
remove (T B (T R a x b) _ E) = T B a x b
remove (T color l@(T R a x b) y r) = bubble color l' mx r 
 where mx = max l
       l' = removeMax l

OuickCheck found both errors easily.

In delete, the following fails for trees of size one:

       T _ a y b = del s

And, in remove, the general case for removal:

remove (T color l@(T R a x b) y r) = bubble color l' mx r 

It is too specific, since it forces the left child to be red. (It was the result of copying and pasting the code one line above it.)

Testing deletion

The first two properties check for validity after deletion:

prop_InsertDeleteValid :: RBSet Int -> Int -> Bool
prop_InsertDeleteValid t x = prop_RBValid(delete x (insert x t))

prop_DeleteValid :: RBSet Int -> Int -> Bool
prop_DeleteValid t x = prop_RBValid(delete x t)

The first test caught both bugs.

Other properties can be validated as well.

For instance, a property can check that deletion removes a key:

prop_MemberDelete :: RBSet Int -> Int -> Property
prop_MemberDelete t x = member x t ==> not (member x (delete x t))

while another can validate that other keys are unaffected by deletion:

prop_DeletePreserve :: RBSet Int -> Int -> Int -> Property
prop_DeletePreserve t x y =
  x /= y ==> (member y t) == (member y (delete x t))