The controlflow problem
In programming languages, and static analysis in particular, the higherorder controlflow problem refers to the fact that the precise target of a function call may not be obvious.
The problem afflicts functional languages in the form of firstclass functions, and it afflicts objectoriented languages in the form of dynamically dispatched methods.
For example, in an objectoriented language, the target of the
function call object.method()
depends on the value that flows to the expression object
.
Thus, the controlflow of this call depends on the
dataflow affecting the value of object
.
If the object object
were a function parameter, it's
clear that the controlflow of this function would also determine the
dataflow of its parameters.
As another example, the target of a function call like (g x)
in Lisp depends on where this call appears.
This call could appear in the term (lambda (g) (g x))
,
in which case the target of g
depends on where this procedure flows.
Consider, further, the controlflow possibilities resulting from
the term
(map (lambda (g) (g x)) myprocedures)
.
In summary, in higherorder languages, controlflow affects dataflow, even as dataflow affects controlflow. Any attempt to reason about one facet must grapple with the other.
The valueflow problem
In practice, the scope of the controlflow problem is always widened
to the more general valueflow problem.
To contrast, the controlflow problem asks, "Which procedures may the expression
f
be in the call (f x)
?"; the valueflow
problem asks "To which values may the expressions f
and x
evaluate?"
The challenge of the valueflow problem is that, in general, it is undecidable to determine the exact set of values to which an expression might evaluate.
Solution: Simulate the program
Controlflow analyses (CFAs) are the class of algorithms that
solve the valueflow problem.
Because a precise solution is impossible, CFAs compute conservative
overapproximations.
That is, if a CFA says that the procedure foo
is invoked
at some call site, then it may be invoked at that call site.
If foo
can't actually be invoked, then it is a false
positive.
CFAs attempt to minimize the number of false positives without
resorting to intractable techniques.
The simplest way to write a CFA is to use a "smallstep abstract interpretation." The code for an abstract interpretater can be quite similar to that of an ordinary interpreter; in practice, there are three major differences:
 Abstract interpreters use abstract values.
For instance, one might use
positive
ornegative
, instead of3
or4
. Another common strategy has all pointer addresses allocated at the same program location share the same "abstract" pointer address.  Abstract interpreters can be nondeterministic; often,
when an abstract interpreter encounters a conditional (e.g.
if
), it follows all branches. If a CFA thinks there could be two procedures called at a call site, it will nondeterministically invoke both of them during its analysis.  Abstract interpreters typically operate over a finite number of machine states, to ensure termination. In instances where abstract interpreters don't use a finite statespace, a special technique called widening can accelerate and guarantee termination of the interpreter.
In essence, an abstract interpreter simulates all possible (and, necessarily, some impossible) executions of the program. By looking over all possible executions, one can check whether there is any execution in which an expression gets bound to a number or a string. Similarly, one can look at every state containing a call site of interest, to see which procedures may actually get called there.
The first popular solution to the controlflow problem was kCFA, a hierarchy of increasingly precise (and increasingly slower) analyses. kCFA was originally specified as an abstract interpreter for Scheme that had been translated into continuationpassing style. (An implementation of kCFA for that kind of Scheme is included below.)
The core principles of CFAs translate easily between different languages. That is, once you understand how to implement a CFA for Scheme, it's not hard to implement a CFA for any other language.
How 0CFA thinks
0CFA is a special case of the kCFA framework, and it is one of the most popular CFAs. 0CFA is notable for the simplicity of its abstraction: values are abstracted to the syntax from which they came. In Scheme, a procedure is abstracted to the lambda term that created it; the environment associated with its closure is ignored entirely.
This coarse abstraction leads to considerable imprecision, but it has a major benefit in the form of a polynomialtime bound (cubic, in fact) for an efficient abstract interpreter.
For the pure lambda calculus, as described by the following grammar:
f,e ::= (lambda (v) e)  (f e)  v v is a variable
0CFA could be thought of as filling in a "flows to" relation according to three rules:

For each lambda term
(lambda (v) e)
:(lambda (v) e)
flows to(lambda (v) e)
. 
For each application
(f e)
, if the value(lambda (v) e')
flows to the functionf
and the value value flows to the argumente
, thenvalue flows tov
. 
For each application
(f e)
, if the value(lambda (v) e')
flows to the functionf
and the value value flows to bodye'
, thenvalue flows to(f e)
.
A word about kCFA
kCFA improves the precision of the flow analysis by
considering in what context a value flows to an expression.
For instance, 1CFA might say, "(lambda (v) e)
flows to
the variable x
, when the procedure (lambda (x)
e')
is called from the site (g h)
."
Where to learn more
If you'd like to get into the details of how CFAs work, there are a few good papers and books out there. Here's what I recommend:
 Principles of Program Analysis covers many topics in static program analysis, including abstract interpretation, CFAs and dataflow analysis. The book covers both imperative and functional languages. It describes techniques so standard you'll find them in GCC, but it also contains analyses so sophisticated that no one has ever implemented them in a commercial compiler.
 The kCFA algorithm is wellexplained in the first three chapters of Olin Shivers's Ph.D. dissertation. (Olin discovered kCFA.)
 A modern (simpler) formulation of kCFA is given in the first five chapters of my own Ph.D. dissertation.
 I recently published a paper describing kCFA for objectoriented languages. Email me if you'd like a preprint! (The cameraready version will be posted here soon.)
 I teach courses and seminars on program analysis. If you're near the University of Utah, please drop by, or come get your M.S. or Ph.D.!
A reference implementation of kCFA in R5RS Scheme
The implementation below of kCFA is a tutorial/reference implementation. That is, it's purely functional, it uses convenient data structures over efficient data structures, and it doesn't use any form of widening to accelerate convergence. (In fact, it is exponential time instead of cubic.)
The implementation is based on the mathematics found in my Ph.D. dissertation. It uses a smallstep abstract interpreter that transforms an abstract machine state into possible successor states. To compute the analysis, it crawls the graph of machine states (starting from an initial state) until all potentially reachable states have been found. The resulting set of reachable states is then summarized to yield a composite abstract heap. This abstract heap is further summarized to produce a map from variables to the lambda terms which may flow to them.
A reference implementation of kCFA in PLT Scheme
Translation courtesy of Jay McCarthy.
An implementation in Java
Courtesy of Jens Nicolay.