Instructor | Matt Might |
TA | Youngrok Bahn |
Time | Tues & Thurs, 2:00pm - 3:20pm |
Location | MEB 3105 |
Appts. | Office hours after each lecture, or when my door is open. |
Group | Utah Compilers, Spring 2011 |
Blog | blog.might.net (I'll post lecture notes here.) |
The field of compilers concerns the automated translation of high-level programming languages into low-level code.
This course covers the design and implementation of compilers, and it explores related topics such as interpreters, virtual machines and runtime systems.
This is not an easy course, but it is a rewarding one for those who finish.
Few programmers will ever have to work on a production compiler, but knowing how compilers work is an asset. It makes you a better programmer.
Knowing how a compiler thinks will change how you code. Once you know what compilers can and cannot optimize, you will target the intersection of human and mechanical comprehension.
Your code will look better.
Many programming tasks can be framed in terms of interpretation or translation. Compilers teaches you how to solve tasks like these properly.
Many companies develop an internal programming or scripting language. Most of them get it wrong. If you've taken compilers, you can fix it.
Finally, if you want to develop your own programming language, take compilers and spare the world some accidental indecency.
Your final grade is the maximum of your average on the projects and your grade on the final.
You do not have to take the final if you are satisfied with your grade.
The final exam will be impossible to finish in the alloted time.
You should answer those questions which you know best.
The weight of each question is proportional to the percentage of students that failed to answer it or answered it incorrectly.
The result will be curved.
I'm morally opposed to the way textbooks are priced.
There is no required textbook for the class.
I'll link to applicable online material from here, as well as my notes, the slides and any code we use in class.
match
quasiquote
I encourage the formation of study groups to discuss the theory and algorithms behind the projects.
You may not share code.
If you want to learn it, you have to implement it.
Violators of this rule will be dealth with under the University's academic misconduct policy.
This semester, we'll be building a Python 3.1 compiler.
You can use any language you want to do the implementation.
I strongly encourage learning and using a functional language such as Racket, Scala, OCaml/SML, Clojure or Haskell.
Anything else is a knife in a gunfight.
Valid until the week of finals.
Due: Sunday, Feb 6th, 11:59 PM Utah time
Your task is to construct a lexical analyzer for the Python programming language.
Python is a relatively simple language to tokenize, with the notable exception of its significant whitespace.
For reference, the lexical specification for Python is available as part of the official documentation.
README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept input on STDIN
and send
output to STDOUT
.
make test
should run any test cases you used.(NEWLINE)
-- for a logical newline.
(INDENT)
-- for a logical increase in indentation.
(DEDENT)
-- for a logical decrease in indentation.
(ID name)
-- for
an identifier, where name
is a
Racket-compatible literal string.
(LIT value)
-- for a literal value,
where value
is a
Racket-compatible
literal value.
(If it's a string, it must be properly escaped.).
(KEYWORD symbol)
-- for
an keyword, where symbol
is a
Racket-compatible literal symbol.
(PUNCT text)
-- for operators and
delimiters, where text
is a Racket
string containing the text of the operator or delimiter.
(ENDMARKER)
-- for the end of the input.
(ERROR "explanation")
and quit.
[A-Za-z_][A-Za-z_0-9]*
^
regular expression pattern for matching the start of a line.
/^(re)/
.
The special pattern ^
matches the start of the input (or in some cases, the start of a line).
# Comment 1 # Comment 2 # Factorial: def fact( x\ ): if x == -1: return 1.j elif x ==0: return 1 else: return x* fact(x - 1) s = "foo\ \\ \n\'\"" fact(20)should produce:
(KEYWORD def) (ID "fact") (PUNCT "(") (ID "x") (PUNCT ")") (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD if) (ID "x") (PUNCT "==") (PUNCT "-") (LIT 1) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT +1.i) (NEWLINE) (DEDENT) (KEYWORD elif) (ID "x") (PUNCT "==") (LIT 0) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT 1) (NEWLINE) (DEDENT) (KEYWORD else) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (ID "x") (PUNCT "*") (ID "fact") (PUNCT "(") (ID "x") (PUNCT "-") (LIT 1) (PUNCT ")") (NEWLINE) (DEDENT) (DEDENT) (ID "s") (PUNCT "=") (LIT "foo\\ \n\'\"") (NEWLINE) (ID "fact") (PUNCT "(") (LIT 20) (PUNCT ")") (NEWLINE) (ENDMARKER)
You may find the sample lexers written in flex to be useful.
Due: March 14th, 11:59 PM Utah time
Your task is to construct a parser for the Python programming language.
Unusually among programming languages, Python is straightforward to parse. In fact, it is one of the few popular programming languages with an LL(1) grammar, making it amenable to hand-written recursive descent parser.
Of course, it is also possible to use a parsing tool.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept input on STDIN
and send
output to STDOUT
.
make lex
should run the lexer on STDIN
.make parse
should run the parser on STDIN
.make test
should run any test cases you used.#f
on failure.
$ pylex | pyparse
A reference web app is available.
You may find the sample parsers useful.
If you take the Racket route, stub code is available for parsing with derivatives.
Due: March 28th, 11:59 PM Utah time (+ 1 week)
Your task is to construct a translator from the subset of Python from Project 2 into a high-level intermediate representation (HIR).
The high-level IR shares many features with Python, including dynamic typing and higher-order procedures.
To focus on key concepts, some additional parts of Python have been removed for this project.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the web app by consuming a tokenized input on STDIN and
printing the AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make trans
should accept input on STDIN
and send
output to STDOUT
.
make test
should run any test cases you used.error
on failure.
print
.
.name
.
(Note: The code to interpret HIR fails immediately with an error message if these constructs are invoked.)
try
blocks have exactly one except
clause,
and no else
or finally
clauses, i.e.:
try: main code except: failure code
raise
can raise any value (not just exceptions).
except
clause for a try block does not specify which exception it catches, nor does it name it.del a[i]or
del a.fbut not:
del xOf course, you could also write:
del f(3)["foo"][i]
global
construct, talk to Matt.
That is, you have to handle:
y = 10 def f(x): global y y = xbut not:
def f(x): global y y = x
$ pylex | pyparse | ./pytrans
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code an the project description, notify Matt.
To execute HIR code, you can prepend hir-header.rkt to the result of your translation, and then execute it in racket.
(hir-header.rkt
provides macros and functions to define all of elements of
HIR in terms of Racket.)
You can grab a reference implementation from pytrans_rkt.zo.
It's compiled Racket bytecode for Racket 5.1.
To run, use racket pytrans_rkt.zo
and send input on STDIN.
Due: April 11th, 11:59 PM Utah time (+1 week)
Your task is to construct a translator from the subset of Python in Project 3 into continuation-passing style (CPS).
Of course, the expected way of accomplishing this is to translate HIR into CPS.
It is tedious to translate HIR directly to CPS. It is strongly recommended that you push HIR into a desugaring phase--dropping it down to LIR first.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the web app by consuming a tokenized input on STDIN and
printing the AST on stdout. pytrans
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the HIR AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept Python code on STDIN
and send
CPS code to STDOUT
.
make test
should run any test cases you used.error
on failure.
define-syntax
form in detail,
you may learn a great deal about how to desugar from hir-header.rkt
.
This is particularly true for the desugaring of try
.
$ make test-cps $ make test-lir
Of the reading material for the course, the following are most helpful:
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code an the project description, notify Matt.
To execute LIR code, you can prepend lir-header.rkt to the result of your translation, and then execute it in racket.
To execute CPS code, you can prepend cps-header.rkt to the result of your translation, and then execute it in racket; for example, you could do:
$ make run < myfile.py > myfile.cps $ cat cps-header.rkt myfile.cps > myfile.rkt $ racket myfile.rkt
You can grab a reference desugarer from pydesugar_rkt.zo.
You can grab a reference cps converter from pycps_rkt.zo.
These are compiled bytecode for Racket 5.1
Due: April 25th, 11:59 PM Utah time
Your task is to construct a translator from the subset of Python in Project 4 into C.
Of course, the expected way of accomplishing this is to translate from CPS.
To make this transformation as easy as possible, it is recommended that you perform mutable variable elmination, flat closure conversion and lambda lifting. If you do, then the result may be more easily transliterated into C.
In terms of restructuring the language for these transforms, it is recommended that you follow this spec.
pylex
is in the current PATH, and that it matches the behavior
of the web app by consuming a file on STDIN and printing its tokens as
S-Expressions line-by-line on STDOUT. pyparse
is in the current PATH, and that it matches the
behavior of the web app by consuming a tokenized input on STDIN and
printing the AST on stdout. pytrans
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the HIR AST on stdout. pydesugar
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the LIR AST on stdout. pycps
is in the current PATH, and that it matches the
behavior of the bytecode app by a Python AST on STDIN and
printing the LIR AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept Python code on STDIN
and send
C code to STDOUT
.
make test
should run any test cases you used.error
on failure.
This code is provided without warranty or guarantee of correctness. If you find a discrepancy between this code an the project description, notify Matt.
To execute MVE'd, closue-converted or lambda-lifted code, you can prepend c-header.rkt to the result of your translation, and then execute it in racket.
You may find pytoc-common.rkt useful for writing these transforms.
You can grab a reference MVE from pymve_rkt.zo.
You can grab a reference closure-converter from pycc_rkt.zo.
You can grab a reference lambda-lifter from pyll_rkt.zo.
You can grab a mostly-completed C translator from pytoc.rkt.
These are compiled bytecode for Racket 5.1
You can find tests cases here.