Programming language analysis
Instructor: Matt Might.
Time: Tuesdays and Thursdays from 10:45am to 12:05pm.
Disclaimer: This syllabus is tentative and subject to change.
Course summary and objective
Static analysis is the field of study concerned with algorithms that extract information about the run-time behavior of a program from the source code (or machine code) of that program. Static analysis has been a vibrant field of research since the dawn of computing, and today, static analysis is the driving force behind compiler-based program optimization, tool-based software engineering, bottom-up information security, formal program verification and automatic program parallelization.
For example, in the following Java code:
for (int i = 0; i < a.length; ++i) {
a[i] = read() ;
}
a good static analyzer can prove that the expression a[i] will never throw an out of bounds exception, and as a result, the implicit bounds check can be discarded in order to improve performance.
(When a static analyzer can prove that an index i is in
bounds for some array a for a language like C, then the
analyzer has proved that the expression a[i] does not
lead to a buffer overflow security vulnerability.)
Turing's halting problem prevents any given static analyzer from recovering perfect information for all programs. Consequently, techniques in static analysis rely on models that approximate program behavior in a computable fashion. The primary objective in the design of a static analysis is to model the desired run-time behavior as accurately as necessary, but to keep the analysis efficiently computable for the average case.
Upon completing this course, students should be comfortable reading a recently published research paper in the field of static analysis, and students will be familiar with the standard techniques for static analysis of both functional and imperative programs.
Prerequisites
There are no formal prerequisites for this course. Anything required in the course will be covered in the course.Topics
During the course, students will be introduced to the following concepts, and possibly more:- Formal semantics.
- Small-step semantics.
- Big-step semantics.
- Basic order theory up to lattices.
- Basic fixed-point theory.
- Static-single-assignment form (SSA).
- Lambda calculus.
- Administrative normal form lambda calculus (ANF).
- Continuation-passing-style lambda calculus (CPS).
- Intraprocedural data-flow analyses.
- Constraint-based analyses.
- Unification-based analyses.
- Abstract interpretation.
- Type-based analyses.
- Higher-order control-flow analysis (k-CFA).
- Alias and pointer analysis (Steensgard, Andersen).
- Dependence analysis.
- Escape analysis.
- Array-bounds analysis.
- Analysis-driven program transformations.
- Proof techniques for demonstrating soundness.
- Analysis of parallel programs.
Required textbook
The only required textbook for this class will be provided as a pdf to students attending the class free of charge.
Any papers assigned for reading will be provided as pdfs.
Supplementary textbooks
The following textbooks are not required, but they are a useful secondary point of reference for the course material. Students considering doing research in static analysis or related areas will almost certainly want to grab a copy.
The Nielsons' book
Semantics with Applications
is one of the best introductions to formal semantics out there, and the application featured in the book is static analysis:
The same authors (plus Hankin) have also written a book on static
analysis, Principles
of Program Analysis:
I also maintain a list of useful books, papers and materials for graduate students.
Grading
Your final grade is the better of:
- your grade on the comprehensive final exam; or
- your average grade for the projects.
In other words, if you get an A on the final, then you get an A in the class, no matter how badly you did on the projects. Or, if you get a B average on the projects, but an F on the final, then you still get a B for the class.
Projects
There will be roughly four or five miniprojects throughout the semester. Each miniproject will involve implementing a known static analysis for a toy language. The syntax of each toy language will always be S-Expression-based, so motivated students may complete the assigment in the programming language of their choosing by writing their own parser. However, a parser for the toy language will always be provided that works with both the Java and Scala programming languages. If you decide to use a language other than Java or Scala, contact me in advance to make sure I can run it on Mac OS X.Collaboration policy
For projects, share anything but code.
Bonus
There are bonus opportunities to add percentage points to your final grade:
- Write or significantly clean and expand a wikipedia article on a topic related to the class. (+2%)
- Write a detailed critique (what's good, what's bad, what's missing) of a chapter of the textbook. (+2%)
- Write a peer-review-quality report of a research paper. (+5%)
- Implement and empirically test an analysis from an unpublished research paper. Contact me to gain access to an archive of eligible unpublished papers. [Warning: This will require mastery of virtually all course material.] (+100%)
- Propose your own project. (+x%)
- Any other bonus projects announced in class.
There are also bonus opportunities for each project:
- Implement the fastest correct analysis. (+5%)
- Parallelize the code for a project. (Multiply the grade for that project by s/p, where s is the time taken by the fastest project running on a single core, and p is the time taken on two cores for a benchmark suite of my choosing.)
- Write a benchmark that causes another student's analysis to fail. (+0.2% for each failure; maximum of +10% for all benchmarks.)
Late policy
Late assignments will have min(2d - 1, 100) percentage points subtracted from the final grade of the assignment, where d is the number of days late.
Tentative schedule
This schedule is subject to change. Once a pace for the course is established, the rest of the schedule will be fleshed out.
| Date | Topic |

