Programming language analysis

Instructor: Matt Might.

TA: .

Office hours: Any time my door is open, or set up a meeting by email.

Time: Tuesdays and Thursdays from 10:45am to 12:05pm.

Links: [google group] [rss]

Disclaimer: This syllabus is tentative and subject to change.

Course summary and objective

Static analysis is the field of study concerned with algorithms that extract information about the run-time behavior of a program from the source code (or machine code) of that program. Static analysis has been a vibrant field of research since the dawn of computing, and today, static analysis is the driving force behind compiler-based program optimization, tool-based software engineering, bottom-up information security, formal program verification and automatic program parallelization.

For example, in the following Java code:

  for (int i = 0; i < a.length; ++i) {
   a[i] = read() ;
  }
a good static analyzer can prove that the expression a[i] will never throw an out of bounds exception, and as a result, the implicit bounds check can be discarded in order to improve performance. (When a static analyzer can prove that an index i is in bounds for some array a for a language like C, then the analyzer has proved that the expression a[i] does not lead to a buffer overflow security vulnerability.)

Turing's halting problem prevents any given static analyzer from recovering perfect information for all programs. Consequently, techniques in static analysis rely on models that approximate program behavior in a computable fashion. The primary objective in the design of a static analysis is to model the desired run-time behavior as accurately as necessary, but to keep the analysis efficiently computable for the average case.

Upon completing this course, students should be comfortable reading a recently published research paper in the field of static analysis, and students will be familiar with the standard techniques for static analysis of both functional and imperative programs.

Prerequisites

There are no formal prerequisites for this course. Anything required in the course will be covered in the course.

Topics

During the course, students will be introduced to the following concepts, and possibly more:
  • Formal semantics.
  • Small-step semantics.
  • Big-step semantics.
  • Basic order theory up to lattices.
  • Basic fixed-point theory.
  • Static-single-assignment form (SSA).
  • Lambda calculus.
  • Administrative normal form lambda calculus (ANF).
  • Continuation-passing-style lambda calculus (CPS).
  • Intraprocedural data-flow analyses.
  • Constraint-based analyses.
  • Unification-based analyses.
  • Abstract interpretation.
  • Type-based analyses.
  • Higher-order control-flow analysis (k-CFA).
  • Alias and pointer analysis (Steensgard, Andersen).
  • Dependence analysis.
  • Escape analysis.
  • Array-bounds analysis.
  • Analysis-driven program transformations.
  • Proof techniques for demonstrating soundness.
  • Analysis of parallel programs.
Toward the end of the semester, we will also read and discuss a few research papers from static analysis.

Required textbook

The latest version of the class notes are available here. Updated weekly.

Any papers assigned for reading will be provided as pdfs.

Supplementary textbooks

The following textbooks are not required, but they are a useful secondary point of reference for the course material. Students considering doing research in static analysis or related areas will almost certainly want to grab a copy.

The Nielsons' book Semantics with Applications is one of the best introductions to formal semantics out there, and the application featured in the book is static analysis:

The same authors (plus Hankin) have also written a book on static analysis, Principles of Program Analysis:

This book is a widely used reference in the field.

I also maintain a list of useful books, papers and materials for graduate students.

Grading

Your final grade is the better of:

  • your grade on the comprehensive final exam; or
  • your average grade for the projects.
plus any bonus points earned throughout the semester.

In other words, if you get an A on the final, then you get an A in the class, no matter how badly you did on the projects. Or, if you get a B average on the projects, but an F on the final, then you still get a B for the class.

Projects

There will be roughly four or five miniprojects throughout the semester. Each miniproject will involve implementing a known static analysis for a toy language. The syntax of each toy language will always be S-Expression-based, so motivated students may complete the assigment in the programming language of their choosing by writing their own parser. However, a parser for the toy language will always be provided that works with both the Java and Scala programming languages. If you decide to use a language other than Java or Scala, contact me in advance to make sure I can run it on Mac OS X.

Jon Rafkind has kindly supplied an example that illustrates how to get ant and scala to cooperate.

Collaboration policy

For projects, share anything but code.

Bonus

There are bonus opportunities to add percentage points to your final grade:

  • Write or significantly clean and expand a wikipedia article on a topic related to the class. (+2%)
  • Write a detailed critique (what's good, what's bad, what's missing) of a chapter of the textbook. (+2%)
  • Write a peer-review-quality report of a research paper. (+5%)
  • Implement and empirically test an analysis from an unpublished research paper. Contact me to gain access to an archive of eligible unpublished papers. [Warning: This will require mastery of virtually all course material.] (+100%)
  • Propose your own project. (+x%)
  • Any other bonus projects announced in class.
Let me know if any of these bonus opportunities sound interesting to you, and we'll define the requirements before you get started.

There are also bonus opportunities for each project:

  • Implement the fastest correct analysis. (+5%)
  • Parallelize the code for a project. (Multiply the grade for that project by s/p, where s is the time taken by the fastest project running on a single core, and p is the time taken on two cores for a benchmark suite of my choosing.)
  • Write a benchmark that causes another student's analysis to fail. (+0.2% for each failure; maximum of +10% for all benchmarks.)

Late policy

Late assignments will have min(2d - 1, 100) percentage points subtracted from the final grade of the assignment, where d is the number of days late.

Tentative schedule

This schedule is subject to change. Once a pace for the course is established, the rest of the schedule will be fleshed out.

Date Topic

Assignments

Project 3: Classic data-flow analysis

Due: Sunday, April 11th 2009, 23:59:59 MT

Goals

  • Construct a constant-propagation analysis for a flat imperative language.

Description

Constant propagation is one of the most important static analyses from a software engineering perspective. In constant propagation, expressions which have a constant value during execution are converted into constants. Psychologically, this frees a programmer to structure code more naturally. For example, with constant propagation, a programmer could have a global variable, isDebugMode indicating whether or not the program is in debugging mode. Debugging-mode only tests can be wrapped in if (isDebugMode) { ... }. Programmers would probably shy away from such practices if they didn't trust the compiler to inline the value of isDebugMode and eliminate the test.

In this project, you will construct a constant-propagation analysis for a simple, flat abstract machine-code language: Macho. The S-Expression grammar for Macho is:

   prog in Prog ::=  (program stmt*)

   stmt in Stmt ::=  (:= lhs rhs)
                 |   (println rhs)
                 |   (if cond goto lab)
                 |   (label lab)
                 |   (skip)
                 |   (goto lab)
                 |   (return rv)

   lab  in Lab   =   an infinite set of labels

   lhs  in LHS  ::=  v

   rhs  in RHS  ::=  v                  [variable reference]
                 |   z                  [integer constant]
                 |   (op rhs1 ... rhsN) [primitive expression]

   rv   in Ret   =  RHS
   cond in Cond  =  RHS

   op   in Op    = {==,<=,+,*,-,/}
  

For example, the following program computes the factorial of 6:

 
 (program         (:= x 6)
                  (:= a 1)
  (label fact)    (if (<= x 0) goto end)
                  (skip)
                  (:= a (* x a))
                  (:= x (- x 1))
                  (goto fact)
  (label end)     (println a)
                  (return a))

Note that in Macho, labels are actually statements. In fact, they behave like skip instructions, except that they can be the target of a conditional branch or a goto.

In Macho, zero is false and non-zero is true with respect to branching conditionals. Conditional operations, like <= and == return either 0 or 1.

Useful(?) files

The files are provided without any warranty as to correctness. Your professor coded them with haste, tested them three times and assumed correctness. If you find what you believe to be bugs, please let me know!

  1. Macho.scala: A Macho interpreter.
  2. macho.zip: A reference implementation of the constant propagation analysis; to run:
     scala -cp . languages.macho.MachoCPA < machofile
    

Requirements

  1. Tarball and name the turnin file/directory like in project 1. If you want to use zip instead of tar+gzip, go ahead.
  2. In the README file, briefly note what's working, what's broken and which bonus features are implemented. Feel free to include anything else you think might interest the grader.
  3. Include a Makefile with your turnin even if you use ant.
  4. make all should compile everything.
  5. make run should execute your analysis on stdin and print the result on stdout.
  6. The output should generate an in/out entry for each statement of the program that looks like the following:
        ((in   map)
         (stmt stmt)
         (out  map))
       
    Each map is an associative list which maps variables into the constant-propagation lattice: {bot,...,-1,0,1,...,top}.

Examples

  1. Input:
    (program         (:= x 6)
                     (:= a 1)
     (label fact)    (if (<= x 0) goto end)
                     (skip)
                     (:= a (* x a))
                     (:= x (- x 1))
                     (goto fact)
     (label end)     (println a)
                     (return a))
      
    Output:
    ((in   ())
     (stmt (:= x 6))
     (out  ((x 6))))
    
    ((in   ((x 6)))
     (stmt (:= a 1))
     (out  ((a 1) (x 6))))
    
    ((in   ((a top) (x top)))
     (stmt (label fact))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (if (<= x 0) goto end))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (skip))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (:= a (* x a)))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (:= x (- x 1)))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (goto fact))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (label end))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (println a))
     (out  ((a top) (x top))))
    
    ((in   ((a top) (x top)))
     (stmt (return a))
     (out  ((a top) (x top))))
      
  2. Input:
    (program          (:= x 1)
                      (:= a 5)
                      (if x goto alt)
                      (:= a (- 0 a))
                      (:= y (* a a))
                      (goto end)
        (label alt)   (:= y (* a a))
                      (goto end)
        (label end)   (return y))
    
    Output:
    ((in   ())
     (stmt (:= x 1))
     (out  ((x 1))))
    
    ((in   ((x 1)))
     (stmt (:= a 5))
     (out  ((a 5) (x 1))))
    
    ((in   ((a 5) (x 1)))
     (stmt (if x goto alt))
     (out  ((a 5) (x 1))))
    
    ((in   ((a 5) (x 1)))
     (stmt (:= a (- 0 a)))
     (out  ((a -5) (x 1))))
    
    ((in   ((a -5) (x 1)))
     (stmt (:= y (* a a)))
     (out  ((a -5) (x 1) (y 25))))
    
    ((in   ((a -5) (x 1) (y 25)))
     (stmt (goto end))
     (out  ((a -5) (x 1) (y 25))))
    
    ((in   ((a 5) (x 1)))
     (stmt (label alt))
     (out  ((a 5) (x 1))))
    
    ((in   ((a 5) (x 1)))
     (stmt (:= y (* a a)))
     (out  ((a 5) (x 1) (y 25))))
    
    ((in   ((a 5) (x 1) (y 25)))
     (stmt (goto end))
     (out  ((a 5) (x 1) (y 25))))
    
    ((in   ((a top) (x 1) (y 25)))
     (stmt (label end))
     (out  ((a top) (x 1) (y 25))))
    
    ((in   ((a top) (x 1) (y 25)))
     (stmt (return y))
     (out  ((a top) (x 1) (y 25))))
    
  3. Input:
    (program         (:= debug 1)
                     (:= x 1)
                     (:= a 1)
    
     (label loop)    (if (<= x 1) goto end)
                     (if (== debug 0) goto skip1)
                      (println x)
     (label skip1)   (skip)
                     (:= x (+ x 1))
                     (goto loop)
    
     (label end)     (println x)
                     (return x))
    
    Output:
    ((in   ())
     (stmt (:= debug 1))
     (out  ((debug 1))))
    
    ((in   ((debug 1)))
     (stmt (:= x 1))
     (out  ((debug 1) (x 1))))
    
    ((in   ((debug 1) (x 1)))
     (stmt (:= a 1))
     (out  ((a 1) (debug 1) (x 1))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (label loop))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (if (<= x 1) goto end))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (if (== debug 0) goto skip1))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (println x))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (label skip1))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (skip))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (:= x (+ x 1)))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (goto loop))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (label end))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (println x))
     (out  ((a 1) (debug 1) (x top))))
    
    ((in   ((a 1) (debug 1) (x top)))
     (stmt (return x))
     (out  ((a 1) (debug 1) (x top))))
    

Project 2: CPS-conversion and 0CFA

[show]

Due: Friday, March 6th 2009, 23:59:59 MT

Goals

  1. Construct a CPS-converter for a small functional programming language.
  2. Construct an abstract-interpretation-based 0CFA for CPS.

Description

CPS is a powerful intermediate representation used in advanced compilers. 0CFA is an enabling analysis for optimizing and reasoning about higher-order programs. 0CFA, like other higher-order flow analyses, bounds the set of values to which an expression may evaluate. Specifically, 0CFA is focused on figuring out which λ-terms may flow to a given expression. In this assignment, you will learn how to convert a program into CPS, and you will learn how to run 0CFA on CPS.

The input language is the same as Project 1, but with a few additional features: primitive integers and conditionals. The grammar for this language is:

 f,e in Exp = v                         [Variable]
            | (lambda (v1 ... vn) e)    [Lambda term]
            | (let* ((v e) ...) e)      [Sequential let]
            | (f e1 ... en)             [Application]
            | (if-zero e1 e2 e3)        [Conditional]
            | z                         [Integer]
            | succ                      [Successor]
            | pred                      [Predecessor]
Here's an example program in this language that does a countdown:
(let* ((Y (lambda (X)
            ((lambda (procedure)
               (X (lambda (arg) ((procedure procedure) arg))))
             (lambda (procedure)
               (X (lambda (arg) ((procedure procedure) arg)))))))
       
       (countdown (Y (lambda (countdown) (lambda (n)
                                   (if-zero n
                                            0
                                            (countdown (-1 n))))))))
  (countdown 10))

The target grammar for CPS is:

 f,e  in Exp   =   Lam + Var + Int + Prim
 p    in Prim ::=  (cps succ)  |  (cps pred)  |  halt
 lam  in Lam  ::=  (lambda (v1 ... vn) call)
 call in Call ::=  (f e1 ... en)
               |   (if-zero e1 e2 e3)

Any CPS-transformation that is meaning-preserving will be accepted.

For 0CFA, implement a small-step abstract interpreter for 0CFA for CPS as described in the notes. Your are encouraged to use a subsumption-based termination test to achieve a modicum of efficiency. Using widening to achieve a polynomial-time algorithm is worth extra credit.

Useful(?) files

  • SExp.scala is an S-Expression parser in scala. (It's updated since the last assignment.)
  • LCPlus.scala is an interpreter for the updated language.
  • LambdaCalculusCPSTransformer.scala contains three CPS transformers for the pure lambda calculus.
  • cps-0cfa.zip contains a reference implementation of the CPS translator and 0CFA. To run them:
     scala -cp . languages.lcplus.CPSTranslator < input-file
     scala -cp . languages.lcplus.ZeroCFA < input-file
    
    If you want to run 0CFA on a CPS-converted program, run:
     scala -cp . languages.lcplus.ZeroCFA --lang=cps < input-file
    
    If you find bugs in any of these programs, report them to Matt.

    Note: Your implementation should accept CPS directly.

  • The first three chapters of Olin Shivers' dissertation are a great reference for k-CFA.

Requirements

  1. Tarball and name the turnin file/directory like in project 1. If you want to use zip instead of tar+gzip, go ahead.
  2. Include anything you'd like the grader to know in the README.
  3. Include a Makefile with your turnin even if you use ant. (Several of the ant submissions from Project 1 didn't work on my system.)
  4. make all should compile everything.
  5. make cps should run your CPS translator on stdin.
  6. make 0cfa should run 0CFA on the CPS program on stdin.
  7. The CPS-translator should supply the primitive halt as the top-level continuation. For example, if you were to CPS-convert just the expression 3, the resulting program would be (halt 3). So, a CPS-converted program will be a call term.
  8. The output of your 0CFA should obey the following grammar:
     OUT   ::= ( ENTRY* )
     ENTRY ::= ( VARNAME VALUE * )
     VALUE ::= Num
            |  succ
            |  pred
            |  halt
            |  (lambda (v1 ... vn) call)
    
    Each entry should list the "values" which may flow to a particular variable. For example, the program id-many.cps could produce this output file.
  9. Write one interesting benchmark. (More happily accepted.) Your benchmark should return an integer.

Turn in

Mechanism TBD.

Advice

  1. The CPS translator in the reference implementation is smarter than the naive translator, in that its code is in a tighter correspondence with the original code, and it introduces fewer spurious continuations. Your CPS translator doesn't have to be smart. It just has to be correct.
  2. Your assignment will not be graded for efficiency beyond termination; it will, however, be graded for soundness.
  3. You can run CPS-converted code in the provided interpreter. You should get the same answer from the original and converted programs when the result is an integer.

Bonus

  1. Implement some form of widening to achieve polynomial-time termination. (+10%) (I recommend you add a flag to turn this on, in case you screw up widening.)
  2. Implement call/cc in the CPS translator. Include a benchmark to test it out. (+5%)

Project 1: A λ-calculus interpreter and programming with Church encodings

[show]

Due: Friday, Feb 6th 2009, 23:59:59 MT

Goal

Construct a substitution-based interpreter for a call-by-value λ-calculus over closed terms.

Description

The syntax for this language in EBNF form is:

 <exp> ::= <app> | <var> | <lam> | <let*>
 <app> ::= '(' <exp> <exp> * ')'
 <let*> ::= '(' 'let*' '(' ( '(' <var> <exp> ')' ) * ')' <exp> ')'
 <lam> ::= '(' 'lambda' '(' <var> * ')' <exp> ')'
 <var> ::= [^)( \n\t\r]+ 

(Note that this is a legal subset of Scheme.)

First, desugar let* expressions into nested let expressions; then desugar each let expression into an immediate application of a λ-term. This will leave a program in the following grammar:

 <exp> ::= <app> | <var> | <lam> 
 <app> ::= '(' <exp> <exp> * ')'
 <lam> ::= '(' 'lambda' '(' <var> * ')' <exp> ')'
 <var> ::= [^)( \n\t\r]+ 

After transforming, run the call-by-value substitution semantics until exhaustion.

Finally, output the resulting term.

Call-by-value semantics for the λ-calculus are described in the notes.

Useful files

  • SExp.scala is an S-Expression parser in scala. It can be compiled to a .class file and used from Java, too.
  • churchmarks.scm contains some examples of Church encodings that might be useful in building benchmarks.
  • lcplus.zip contains a reference implementation. To run it, execute scala -cp . languages.lcplus.LCPlus and enter your program into stdin. If you supply the flag --print-intermediate, it will print out intermediate reductions.

Requirements

  1. When untarballed, the file you turn in should create the following file/directory structure:
        <lastname>-<firstname>-project1/
        <lastname>-<firstname>-project1/Makefile
        <lastname>-<firstname>-project1/README
        <lastname>-<firstname>-project1/benchmarks/
       
  2. Running Make run < filename should run the interpreter on the supplied filename.
  3. The output of your interpreter should be a λ-term in the simplified grammar that represents the complete call-by-value β-reduction of the original term. If the input term is non-terminating, your program should run forever and produce no output.
  4. Supply at least three benchmarks in the benchmarks folder. Each benchmark should consist of two files: lastname-firstname-benchmarkname.in (the input program) and lastname-firstname-benchmarkname.out (the output program). If the program is non-terminating, provide no .out file.
  5. Put anything you think the grader might like to know in the README. At the very least, include a description of your benchmarks.
  6. Comment your code appropriately.

Turn in

Email a lastname-firstname-project1.tar.gz file with your code. Include "[cs6969] project 1 turnin" in the subject.

Advice

  1. Projects that are easy to untarball and grade make the grader happy.
  2. "Interesting" benchmarks will receive more credit. "Interesting" means something like a SAT solver (brute force is OK), a particle simulation, a ray tracer (even if it's 1-dimensional), a lambda calculus interpreter, encryption algorithms, a CPU simulator, a virtual machine or anything else which you wouldn't expect someone to implement in what is nearly the pure λ-calculus.