Programming language analysis
Instructor: Matt Might.
TA: .
Office hours: Any time my door is open, or set up a meeting by email.
Time: Tuesdays and Thursdays from 10:45am to 12:05pm.
Links: [google group] [rss]
Disclaimer: This syllabus is tentative and subject to change.
Course summary and objective
Static analysis is the field of study concerned with algorithms that extract information about the run-time behavior of a program from the source code (or machine code) of that program. Static analysis has been a vibrant field of research since the dawn of computing, and today, static analysis is the driving force behind compiler-based program optimization, tool-based software engineering, bottom-up information security, formal program verification and automatic program parallelization.
For example, in the following Java code:
for (int i = 0; i < a.length; ++i) { a[i] = read() ; }a good static analyzer can prove that the expression
a[i]
will never throw an out of bounds exception, and as a result, the implicit bounds check can be discarded in order to improve performance.
(When a static analyzer can prove that an index i
is in
bounds for some array a
for a language like C, then the
analyzer has proved that the expression a[i]
does not
lead to a buffer overflow security vulnerability.)
Turing's halting problem prevents any given static analyzer from recovering perfect information for all programs. Consequently, techniques in static analysis rely on models that approximate program behavior in a computable fashion. The primary objective in the design of a static analysis is to model the desired run-time behavior as accurately as necessary, but to keep the analysis efficiently computable for the average case.
Upon completing this course, students should be comfortable reading a recently published research paper in the field of static analysis, and students will be familiar with the standard techniques for static analysis of both functional and imperative programs.
Prerequisites
There are no formal prerequisites for this course. Anything required in the course will be covered in the course.Topics
During the course, students will be introduced to the following concepts, and possibly more:- Formal semantics.
- Small-step semantics.
- Big-step semantics.
- Basic order theory up to lattices.
- Basic fixed-point theory.
- Static-single-assignment form (SSA).
- Lambda calculus.
- Administrative normal form lambda calculus (ANF).
- Continuation-passing-style lambda calculus (CPS).
- Intraprocedural data-flow analyses.
- Constraint-based analyses.
- Unification-based analyses.
- Abstract interpretation.
- Type-based analyses.
- Higher-order control-flow analysis (k-CFA).
- Alias and pointer analysis (Steensgard, Andersen).
- Dependence analysis.
- Escape analysis.
- Array-bounds analysis.
- Analysis-driven program transformations.
- Proof techniques for demonstrating soundness.
- Analysis of parallel programs.
Required textbook
The latest version of the class notes are available here. Updated weekly.
Any papers assigned for reading will be provided as pdfs.
Supplementary textbooks
The following textbooks are not required, but they are a useful secondary point of reference for the course material. Students considering doing research in static analysis or related areas will almost certainly want to grab a copy.
The Nielsons' book
Semantics with Applications
is one of the best introductions to formal semantics out there, and the application featured in the book is static analysis:
The same authors (plus Hankin) have also written a book on static
analysis, Principles of Program Analysis:
I also maintain a list of useful books, papers and materials for graduate students.
Grading
Your final grade is the better of:
- your grade on the comprehensive final exam; or
- your average grade for the projects.
In other words, if you get an A on the final, then you get an A in the class, no matter how badly you did on the projects. Or, if you get a B average on the projects, but an F on the final, then you still get a B for the class.
Projects
There will be roughly four or five miniprojects throughout the semester. Each miniproject will involve implementing a known static analysis for a toy language. The syntax of each toy language will always be S-Expression-based, so motivated students may complete the assigment in the programming language of their choosing by writing their own parser. However, a parser for the toy language will always be provided that works with both the Java and Scala programming languages. If you decide to use a language other than Java or Scala, contact me in advance to make sure I can run it on Mac OS X.
Jon Rafkind has kindly supplied an example that illustrates how to get ant and scala to cooperate.
Collaboration policy
For projects, share anything but code.
Bonus
There are bonus opportunities to add percentage points to your final grade:
- Write or significantly clean and expand a wikipedia article on a topic related to the class. (+2%)
- Write a detailed critique (what's good, what's bad, what's missing) of a chapter of the textbook. (+2%)
- Write a peer-review-quality report of a research paper. (+5%)
- Implement and empirically test an analysis from an unpublished research paper. Contact me to gain access to an archive of eligible unpublished papers. [Warning: This will require mastery of virtually all course material.] (+100%)
- Propose your own project. (+x%)
- Any other bonus projects announced in class.
There are also bonus opportunities for each project:
- Implement the fastest correct analysis. (+5%)
- Parallelize the code for a project. (Multiply the grade for that project by s/p, where s is the time taken by the fastest project running on a single core, and p is the time taken on two cores for a benchmark suite of my choosing.)
- Write a benchmark that causes another student's analysis to fail. (+0.2% for each failure; maximum of +10% for all benchmarks.)
Late policy
Late assignments will have min(2d - 1, 100) percentage points subtracted from the final grade of the assignment, where d is the number of days late.
Tentative schedule
This schedule is subject to change. Once a pace for the course is established, the rest of the schedule will be fleshed out.
Date | Topic |
Assignments
Project 3: Classic data-flow analysis
Due: Sunday, April 11th 2009, 23:59:59 MT
Goals
- Construct a constant-propagation analysis for a flat imperative language.
Description
Constant propagation is one of the most important static analyses from a software engineering perspective.
In constant propagation, expressions which have a constant value during execution are converted into constants.
Psychologically, this frees a programmer to structure code more naturally.
For example, with constant propagation, a programmer could have a global variable, isDebugMode
indicating whether or not the program is in debugging mode.
Debugging-mode only tests can be wrapped in if (isDebugMode) { ... }
.
Programmers would probably shy away from such practices if they didn't trust the compiler to inline the value of isDebugMode
and eliminate the test.
In this project, you will construct a constant-propagation analysis for a simple, flat abstract machine-code language: Macho. The S-Expression grammar for Macho is:
prog in Prog ::= (program stmt*) stmt in Stmt ::= (:= lhs rhs) | (println rhs) | (if cond goto lab) | (label lab) | (skip) | (goto lab) | (return rv) lab in Lab = an infinite set of labels lhs in LHS ::= v rhs in RHS ::= v [variable reference] | z [integer constant] | (op rhs1 ... rhsN) [primitive expression] rv in Ret = RHS cond in Cond = RHS op in Op = {==,<=,+,*,-,/}
For example, the following program computes the factorial of 6:
(program (:= x 6) (:= a 1) (label fact) (if (<= x 0) goto end) (skip) (:= a (* x a)) (:= x (- x 1)) (goto fact) (label end) (println a) (return a))
Note that in Macho, labels are actually statements. In fact, they behave like skip instructions, except that they can be the target of a conditional branch or a goto.
In Macho, zero is false and non-zero is true with respect to
branching conditionals. Conditional operations, like
<=
and ==
return either 0 or 1.
Useful(?) files
The files are provided without any warranty as to correctness. Your professor coded them with haste, tested them three times and assumed correctness. If you find what you believe to be bugs, please let me know!
- Macho.scala: A Macho interpreter.
- macho.zip: A reference implementation of the constant propagation analysis; to run:
scala -cp . languages.macho.MachoCPA < machofile
Requirements
- Tarball and name the turnin file/directory like in project 1. If you want to use zip instead of tar+gzip, go ahead.
- In the
README
file, briefly note what's working, what's broken and which bonus features are implemented. Feel free to include anything else you think might interest the grader. - Include a
Makefile
with your turnin even if you use ant. make all
should compile everything.-
make run
should execute your analysis on stdin and print the result on stdout. -
The output should generate an in/out entry for each statement of the
program that looks like the following:
((in map) (stmt stmt) (out map))
Each map is an associative list which maps variables into the constant-propagation lattice: {bot
,...,-1
,0
,1
,...,top
}.
Examples
-
Input:
(program (:= x 6) (:= a 1) (label fact) (if (<= x 0) goto end) (skip) (:= a (* x a)) (:= x (- x 1)) (goto fact) (label end) (println a) (return a))
Output:((in ()) (stmt (:= x 6)) (out ((x 6)))) ((in ((x 6))) (stmt (:= a 1)) (out ((a 1) (x 6)))) ((in ((a top) (x top))) (stmt (label fact)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (if (<= x 0) goto end)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (skip)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (:= a (* x a))) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (:= x (- x 1))) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (goto fact)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (label end)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (println a)) (out ((a top) (x top)))) ((in ((a top) (x top))) (stmt (return a)) (out ((a top) (x top))))
-
Input:
(program (:= x 1) (:= a 5) (if x goto alt) (:= a (- 0 a)) (:= y (* a a)) (goto end) (label alt) (:= y (* a a)) (goto end) (label end) (return y))
Output:((in ()) (stmt (:= x 1)) (out ((x 1)))) ((in ((x 1))) (stmt (:= a 5)) (out ((a 5) (x 1)))) ((in ((a 5) (x 1))) (stmt (if x goto alt)) (out ((a 5) (x 1)))) ((in ((a 5) (x 1))) (stmt (:= a (- 0 a))) (out ((a -5) (x 1)))) ((in ((a -5) (x 1))) (stmt (:= y (* a a))) (out ((a -5) (x 1) (y 25)))) ((in ((a -5) (x 1) (y 25))) (stmt (goto end)) (out ((a -5) (x 1) (y 25)))) ((in ((a 5) (x 1))) (stmt (label alt)) (out ((a 5) (x 1)))) ((in ((a 5) (x 1))) (stmt (:= y (* a a))) (out ((a 5) (x 1) (y 25)))) ((in ((a 5) (x 1) (y 25))) (stmt (goto end)) (out ((a 5) (x 1) (y 25)))) ((in ((a top) (x 1) (y 25))) (stmt (label end)) (out ((a top) (x 1) (y 25)))) ((in ((a top) (x 1) (y 25))) (stmt (return y)) (out ((a top) (x 1) (y 25))))
-
Input:
(program (:= debug 1) (:= x 1) (:= a 1) (label loop) (if (<= x 1) goto end) (if (== debug 0) goto skip1) (println x) (label skip1) (skip) (:= x (+ x 1)) (goto loop) (label end) (println x) (return x))
Output:((in ()) (stmt (:= debug 1)) (out ((debug 1)))) ((in ((debug 1))) (stmt (:= x 1)) (out ((debug 1) (x 1)))) ((in ((debug 1) (x 1))) (stmt (:= a 1)) (out ((a 1) (debug 1) (x 1)))) ((in ((a 1) (debug 1) (x top))) (stmt (label loop)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (if (<= x 1) goto end)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (if (== debug 0) goto skip1)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (println x)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (label skip1)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (skip)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (:= x (+ x 1))) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (goto loop)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (label end)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (println x)) (out ((a 1) (debug 1) (x top)))) ((in ((a 1) (debug 1) (x top))) (stmt (return x)) (out ((a 1) (debug 1) (x top))))
Project 2: CPS-conversion and 0CFA
Due: Friday, March 6th 2009, 23:59:59 MT
Goals
- Construct a CPS-converter for a small functional programming language.
- Construct an abstract-interpretation-based 0CFA for CPS.
Description
CPS is a powerful intermediate representation used in advanced compilers. 0CFA is an enabling analysis for optimizing and reasoning about higher-order programs. 0CFA, like other higher-order flow analyses, bounds the set of values to which an expression may evaluate. Specifically, 0CFA is focused on figuring out which λ-terms may flow to a given expression. In this assignment, you will learn how to convert a program into CPS, and you will learn how to run 0CFA on CPS.
The input language is the same as Project 1, but with a few additional features: primitive integers and conditionals. The grammar for this language is:
f,e in Exp = v [Variable] | (lambda (v1 ... vn) e) [Lambda term] | (let* ((v e) ...) e) [Sequential let] | (f e1 ... en) [Application] | (if-zero e1 e2 e3) [Conditional] | z [Integer] | succ [Successor] | pred [Predecessor]Here's an example program in this language that does a countdown:
(let* ((Y (lambda (X) ((lambda (procedure) (X (lambda (arg) ((procedure procedure) arg)))) (lambda (procedure) (X (lambda (arg) ((procedure procedure) arg))))))) (countdown (Y (lambda (countdown) (lambda (n) (if-zero n 0 (countdown (-1 n)))))))) (countdown 10))
The target grammar for CPS is:
f,e in Exp = Lam + Var + Int + Prim p in Prim ::= (cps succ) | (cps pred) | halt lam in Lam ::= (lambda (v1 ... vn) call) call in Call ::= (f e1 ... en) | (if-zero e1 e2 e3)
Any CPS-transformation that is meaning-preserving will be accepted.
For 0CFA, implement a small-step abstract interpreter for 0CFA for CPS as described in the notes. Your are encouraged to use a subsumption-based termination test to achieve a modicum of efficiency. Using widening to achieve a polynomial-time algorithm is worth extra credit.
Useful(?) files
- SExp.scala is an S-Expression parser in scala. (It's updated since the last assignment.)
- LCPlus.scala is an interpreter for the updated language.
- LambdaCalculusCPSTransformer.scala contains three CPS transformers for the pure lambda calculus.
- cps-0cfa.zip contains
a reference implementation of the CPS translator and 0CFA.
To run them:
scala -cp . languages.lcplus.CPSTranslator < input-file scala -cp . languages.lcplus.ZeroCFA < input-file
If you want to run 0CFA on a CPS-converted program, run:scala -cp . languages.lcplus.ZeroCFA --lang=cps < input-file
If you find bugs in any of these programs, report them to Matt.Note: Your implementation should accept CPS directly.
- The first three chapters of Olin Shivers' dissertation are a great reference for k-CFA.
Requirements
- Tarball and name the turnin file/directory like in project 1. If you want to use zip instead of tar+gzip, go ahead.
- Include anything you'd like the grader to know in the
README
. - Include a
Makefile
with your turnin even if you use ant. (Several of the ant submissions from Project 1 didn't work on my system.) make all
should compile everything.make cps
should run your CPS translator on stdin.make 0cfa
should run 0CFA on the CPS program on stdin.- The CPS-translator should supply the primitive
halt
as the top-level continuation. For example, if you were to CPS-convert just the expression3
, the resulting program would be(halt 3)
. So, a CPS-converted program will be a call term. - The output of your 0CFA should obey the following grammar:
OUT ::= ( ENTRY* ) ENTRY ::= ( VARNAME VALUE * ) VALUE ::= Num | succ | pred | halt | (lambda (v1 ... vn) call)
Each entry should list the "values" which may flow to a particular variable. For example, the program id-many.cps could produce this output file. - Write one interesting benchmark. (More happily accepted.) Your benchmark should return an integer.
Turn in
Mechanism TBD.
Advice
- The CPS translator in the reference implementation is smarter than the naive translator, in that its code is in a tighter correspondence with the original code, and it introduces fewer spurious continuations. Your CPS translator doesn't have to be smart. It just has to be correct.
- Your assignment will not be graded for efficiency beyond termination; it will, however, be graded for soundness.
- You can run CPS-converted code in the provided interpreter. You should get the same answer from the original and converted programs when the result is an integer.
Bonus
- Implement some form of widening to achieve polynomial-time termination. (+10%) (I recommend you add a flag to turn this on, in case you screw up widening.)
- Implement
call/cc
in the CPS translator. Include a benchmark to test it out. (+5%)
Project 1: A λ-calculus interpreter and programming with Church encodings
Due: Friday, Feb 6th 2009, 23:59:59 MT
Goal
Construct a substitution-based interpreter for a call-by-value λ-calculus over closed terms.
Description
The syntax for this language in EBNF form is:
<exp> ::= <app> | <var> | <lam> | <let*> <app> ::= '(' <exp> <exp> * ')' <let*> ::= '(' 'let*' '(' ( '(' <var> <exp> ')' ) * ')' <exp> ')' <lam> ::= '(' 'lambda' '(' <var> * ')' <exp> ')' <var> ::= [^)( \n\t\r]+
(Note that this is a legal subset of Scheme.)
First, desugar let*
expressions into nested
let
expressions; then desugar each let
expression into an immediate application of a λ-term.
This will leave a program in the following grammar:
<exp> ::= <app> | <var> | <lam> <app> ::= '(' <exp> <exp> * ')' <lam> ::= '(' 'lambda' '(' <var> * ')' <exp> ')' <var> ::= [^)( \n\t\r]+
After transforming, run the call-by-value substitution semantics until exhaustion.
Finally, output the resulting term.
Call-by-value semantics for the λ-calculus are described in the notes.
Useful files
- SExp.scala is an S-Expression parser in scala. It can be compiled to a .class file and used from Java, too.
- churchmarks.scm contains some examples of Church encodings that might be useful in building benchmarks.
- lcplus.zip contains a reference implementation.
To run it, execute
scala -cp . languages.lcplus.LCPlus
and enter your program into stdin. If you supply the flag--print-intermediate
, it will print out intermediate reductions.
Requirements
- When untarballed, the file you turn in should create the following file/directory structure:
<lastname>-<firstname>-project1/ <lastname>-<firstname>-project1/Makefile <lastname>-<firstname>-project1/README <lastname>-<firstname>-project1/benchmarks/
- Running
Make run < filename
should run the interpreter on the supplied filename. - The output of your interpreter should be a λ-term in the simplified grammar that represents the complete call-by-value β-reduction of the original term. If the input term is non-terminating, your program should run forever and produce no output.
-
Supply at least three benchmarks in the benchmarks folder.
Each benchmark should consist of two files:
lastname-firstname-benchmarkname.in
(the input program) andlastname-firstname-benchmarkname.out
(the output program). If the program is non-terminating, provide no .out file. - Put anything you think the grader might like to know in the README. At the very least, include a description of your benchmarks.
- Comment your code appropriately.
Turn in
Email a lastname-firstname-project1.tar.gz file with your code. Include "[cs6969] project 1 turnin" in the subject.
Advice
- Projects that are easy to untarball and grade make the grader happy.
- "Interesting" benchmarks will receive more credit. "Interesting" means something like a SAT solver (brute force is OK), a particle simulation, a ray tracer (even if it's 1-dimensional), a lambda calculus interpreter, encryption algorithms, a CPU simulator, a virtual machine or anything else which you wouldn't expect someone to implement in what is nearly the pure λ-calculus.