Time | Tues & Thurs, 10:45 AM-12:05 PM |
Location | WEB 2230 |
Instructor | Matt Might (MEB 3450) |
Office hours | After lecture, by appointment, or whenever my door is open. |
TAs | Limou Wang (MEB 4152), Petey Aldous (MEB 3375) |
Limou hours | 2:00pm - 3:00pm (Mon, Wed), 1:00pm - 2:00pm (Fri) |
Petey hours | 9:30am - 10:30am (Mon, Wed), 12:30pm - 1:30pm (Mon, Wed) |
Group | Utah Compilers, Spring 2013 |
Blog | blog.might.net (I'll post lecture notes here.) |
The field of compilers concerns the automated translation of high-level programming languages into low-level code.
This course covers the design and implementation of compilers, and it explores related topics such as interpreters, virtual machines and runtime systems.
This is not an easy course, but it is a rewarding one for those who finish.
Few programmers will ever have to work on a production compiler, but knowing how compilers work is an asset.
It makes you a better programmer.
Knowing how a compiler thinks will change how you code. Once you know what compilers can and cannot optimize, you will target the intersection of human and mechanical comprehension.
Your code will look better. Your code will run better.
Many programming tasks can be framed in terms of interpretation or translation. Compilers teaches you how to solve tasks like these properly.
Many companies develop an internal programming or scripting language. Most of them get it wrong. If you've taken compilers, you can fix it.
Finally, if you want to develop your own programming language, take compilers in order to spare the world some accidental indecency.
There are two ways to earn a grade in this class: tests and projects.
Your grade is your tests grade or your projects grade, whichever is higher.
(You do not have to take the final if you are satisfied with your projects.)
Exams will be impossible to finish in the alloted time.
You should answer those questions which you know best.
The weight of each question is proportional to the percentage of students that failed to answer it or answered it incorrectly.
The result will be curved.
I'm morally opposed to the way textbooks are priced.
There is no required textbook for the class.
Instead of a textbook, I maintain a technical blog with articles on topics covered in the class.
I'll link to applicable online material from here, as well as the slides and any code we use in class.
Some students have requested recommended textbooks.
There is no "Python compiler project" textbook, but many books cover some of the necessary concepts.
A good resource on compiling a dynamic language like Python is Lisp in Small Pieces.
An older (yet remarkably up-to-date) resource on compiling with continuation passing style is Compiling with Continuations. (This is a very advanced book and not recommended for struggling students.)
The classic compiler textbook is "The Dragon Book": Compilers: Principles, Techniques and Tools. Some of the techniques in this book work for dynamic languages like Python. Unfortunately, most of the techniques only work well for C-like typed languages.
match
quasiquote
For the projects, you are allowed to pair with a partrner and submit a single, joint assignment.
You may also work alone.
I encourage the formation of study groups to discuss the theory and algorithms behind the projects.
You may not share code with anyone other than your partner.
If you want to learn it, you have to implement it.
Violators of the collaboration policy will be dealt with under the University's academic misconduct policy, and they will fail the course.
I've caught so many clueless attempts at cheating in my courses that I feel the need to provide a guide to students.
diff
a token stream of your assignment against others.
If you're going to cheat, you need to rewrite the assignment from scratch.
This semester, we'll be building a Python 3.1 compiler.
You can use any language you want to do the implementation.
I strongly encourage learning and using a functional language such as Racket, Scala, OCaml/SML, Clojure or Haskell.
Anything else is a knife in a gunfight.
Valid until the week of finals.
Due: Friday, Feb 1, 11:59 PM Utah time
Your task is to construct a lexical analyzer for the Python programming language.
Python is a relatively simple language to tokenize, with the notable exception of its significant whitespace.
For reference, the lexical specification for Python is available as part of the official documentation.
README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept input on STDIN
and send
output to STDOUT
.
make test
should run any test cases you used.(NEWLINE)
-- for a logical newline.
(INDENT)
-- for a logical increase in indentation.
(DEDENT)
-- for a logical decrease in indentation.
(ID name)
-- for
an identifier, where name
is a
Racket-compatible literal string.
(LIT value)
-- for a literal value,
where value
is a
Racket-compatible
literal value.
(If it's a string, it must be properly escaped.).
(KEYWORD symbol)
-- for
an keyword, where symbol
is a
Racket-compatible literal symbol.
(PUNCT text)
-- for operators and
delimiters, where text
is a Racket
string containing the text of the operator or delimiter.
(ENDMARKER)
-- for the end of the input.
(ERROR "explanation")
and quit.
[A-Za-z_][A-Za-z_0-9]*
^
regular expression pattern for matching the start of a line.
/^(re)/
.
The special pattern ^
matches the start of the input (or in some cases, the start of a line).
# Comment 1 # Comment 2 # Factorial: def fact( x\ ): if x == -1: return 1.j elif x ==0: return 1 else: return x* fact(x - 1) s = "foo\ \\ \n\'\"" fact(20)should produce:
(KEYWORD def) (ID "fact") (PUNCT "(") (ID "x") (PUNCT ")") (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD if) (ID "x") (PUNCT "==") (PUNCT "-") (LIT 1) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT +1.i) (NEWLINE) (DEDENT) (KEYWORD elif) (ID "x") (PUNCT "==") (LIT 0) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT 1) (NEWLINE) (DEDENT) (KEYWORD else) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (ID "x") (PUNCT "*") (ID "fact") (PUNCT "(") (ID "x") (PUNCT "-") (LIT 1) (PUNCT ")") (NEWLINE) (DEDENT) (DEDENT) (ID "s") (PUNCT "=") (LIT "foo\\ \n\'\"") (NEWLINE) (ID "fact") (PUNCT "(") (LIT 20) (PUNCT ")") (NEWLINE) (ENDMARKER)
You may find the sample lexers written in flex to be useful.
A reference implementation is available online: pylex.
Copy your project to /home/unid/pylex/
on caprica.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pylex make make run < test1.py > test1.py.out make run < test2.py > test2.py.out make run < test3.py > test3.py.out ...
Due: Wed, Feb 27 11:59 PM Utah time
Your task is to construct a parser for the Python programming language.
Unusually among programming languages, Python is straightforward to parse. In fact, it is one of the few popular programming languages with an LL(1) grammar, making it amenable to hand-written recursive descent parser.
Of course, it is also possible to use a parsing tool.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept input on STDIN
and send
output to STDOUT
.
make lex
should run the lexer on STDIN
.make parse
should run the parser on STDIN
.make test
should run any test cases you used.#f
on failure.
$ pylex | pyparse
python-ast.grm.sx
file provided with the stub code.
You just add derp reductions to the rules in this file.
-n
flag to pylex that suppresses the
ENDMARKER
token.
To target subsets, change file_input
in pyparse.rkt
to the name of the production you want to
target.
A reference web app is available.
You may find the sample parsers useful.
If you take the Racket route, stub code is available for parsing with derivatives.
Copy your project to /home/unid/pyparse/
on caprica.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pyparse make make run < test1.py > test1.py.out make run < test2.py > test2.py.out make run < test3.py > test3.py.out pylex < test1.py | make parse > test1.py.out pylex < test2.py | make parse > test2.py.out pylex < test3.py | make parse > test3.py.out ...
Due: Wed March 27 11:59 PM Utah time
Your task is to construct a translator from the subset of Python from Project 2 into a high-level intermediate representation (HIR).
The high-level IR shares many features with Python, including dynamic typing and higher-order procedures.
To focus on key concepts, some additional parts of Python have been removed for this project.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the reference app by consuming a tokenized input on STDIN and
printing the AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make trans
should accept the output of project 2 on STDIN
and send
output to STDOUT
.make run
should accept a Python on STDIN
and send
output to STDOUT
.make test
should run any test cases you used.error
on failure.
print
.
.name
.
(Note: The code to interpret HIR fails immediately with an error message if these constructs are invoked.)
try
blocks have exactly one except
clause,
and no else
or finally
clauses, i.e.:
try: main code except: failure code
raise
can raise any value (not just exceptions).
except
clause for a try block does not specify which exception it catches, nor does it name it.del a[i]or
del a.fbut not:
del xOf course, you could also write:
del f(3)["foo"][i]
global
construct, talk to Matt.
That is, you have to handle:
y = 10 def f(x): global y y = xbut not:
def f(x): global y y = x
$ pylex | pyparse | ./pytrans
transform-program
transform an entire Python program into HIR;
let transform-stmt
transform an individual Python stmt into HIR;
let transform-exp
transform an individual Python expression into HIR.
print(3)
and work up from there.
Examine the output of the reference implementation on these small programs
to unravel its logic.
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code and the project description, notify Matt.
A reference web application is available.
To execute HIR code, you can prepend hir-header.rkt to the result of your translation, and then execute it in racket.
(hir-header.rkt
provides macros and functions to define all of elements of
HIR in terms of Racket.)
If you want, you can start the project with stub code;
this file illustrates a common design pattern for compilers--a transformation function for each class of term.
In this case, it contains
transform-program
,
transform-stmt
and
transform-exp
.
You can grab a reference implementation from pytrans_rkt.zo.
It's compiled Racket bytecode for Racket 5.3.1.
To run, use racket pytrans_rkt.zo
and send input on STDIN.
Copy your project to /home/unid/pytrans/
on caprica.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pytrans make make run < test1.py > test1.py.out make run < test2.py > test2.py.out make run < test3.py > test3.py.out pylex < test1.py | pyparse | make trans > test1.py.out pylex < test2.py | pyparse | make trans > test2.py.out pylex < test3.py | pyparse | make trans > test3.py.out ...
Due: Wed April 10 11:59 PM Utah time
Your task is to construct a translator from the subset of Python in Project 3 into continuation-passing style (CPS).
Of course, the expected way of accomplishing this is to translate HIR into CPS.
It is tedious to translate HIR directly to CPS. It is strongly recommended that you push HIR into a desugaring phase--dropping it down to LIR first.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the web app by consuming a tokenized input on STDIN and
printing the AST on stdout. pytrans
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the HIR AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept Python code on STDIN
and send
CPS code to STDOUT
.
make test
should run any test cases you used.error
on failure.
define-syntax
form in detail,
you may learn a great deal about how to desugar from hir-header.rkt
.
This is particularly true for the desugaring of try
.
$ make test-cps $ make test-lir
Of the reading material for the course, the following are most helpful:
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code an the project description, notify Matt.
To execute LIR code, you can prepend lir-header.rkt to the result of your translation, and then execute it in racket.
To execute CPS code, you can prepend cps-header.rkt to the result of your translation, and then execute it in racket; for example, you could do:
$ make run < myfile.py > myfile.cps $ cat cps-header.rkt myfile.cps > myfile.rkt $ racket myfile.rkt
You can grab a reference desugarer from pydesugar_rkt.zo.
You can grab a reference cps converter from pycps_rkt.zo.
These are compiled bytecode for Racket 5.3
Copy your project to /home/unid/pycps/
on caprica.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pycps make make run < test1.py > test1.py.cps make run < test2.py > test2.py.cps make run < test3.py > test3.py.cps pylex < test1.py | pyparse | pytrans | make cps > test1.py.cps pylex < test2.py | pyparse | pytrans | make cps > test2.py.cps pylex < test3.py | pyparse | pytrans | make cps > test3.py.cps ...
Due: Wed 24 April 11:59 PM Utah time
Your task is to construct a translator from the subset of Python in Project 4 into C.
Of course, the expected way of accomplishing this is to translate from CPS.
To make this transformation as easy as possible, it is recommended that you perform mutable variable elmination, flat closure conversion and lambda lifting. If you do, then the result may be more easily transliterated into C.
In terms of restructuring the language for these transforms, it is recommended that you follow this spec.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the web app by consuming a tokenized input on STDIN and
printing the AST on stdout. pytrans
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the HIR AST on stdout. pydesugar
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the LIR AST on stdout. pycps
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the LIR AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept Python code on STDIN
and send
C code to STDOUT
.
make test
should run any test cases you used.error
on failure.
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code an the project description, notify Matt.
To execute MVE'd, closue-converted or lambda-lifted code, you can prepend c-header.rkt to the result of your translation, and then execute it in racket.
You may find pytoc-common.rkt useful for writing these transforms.
You can grab a reference MVE from pymve_rkt.zo.
You can grab a reference closure-converter from pycc_rkt.zo.
You can grab a reference lambda-lifter from pyll_rkt.zo.
You can grab a mostly-completed C translator from pytoc.rkt.
These are compiled bytecode for Racket 5.3
You can find tests cases here.
Place the project on caprica
in ~/pytoc/
.