Catalog No. | 17541 |
Time | Tue & Thu, 2:00 PM - 3:20 PM |
Location | WEB L102 |
Instructor | Matt Might (MEB 3450) |
Office hours | After lecture, by appointment, or whenever my door is open. |
TAs | Sarah Spall (MEB 3465) |
TA hours | 10:00 AM - 11:00 AM |
Group | Google Group |
Blog | blog.might.net (I'll post lecture notes here.) |
Live code | Code written in lecture |
The field of compilers concerns the automated translation of high-level programming languages into low-level (virtual or hardware) machine code.
This course covers the design and implementation of compilers, and it explores related topics such as interpreters, virtual machines and runtime systems.
This is not an easy course, but it is a rewarding one for those who finish.
Few programmers will ever work on a production compiler, but knowing how compilers think is an asset.
It makes you a better programmer.
Knowing how a compiler thinks changes how you code. When you know what compilers can and cannot optimize, you target the intersection of human and mechanical comprehension.
Your code will look better. Your code will run better.
Many programming tasks can be framed in terms of interpretation or translation. Compilers teaches you how to solve tasks like these properly.
Many companies develop an internal programming or scripting language. Most of them get it wrong. If you've taken compilers, you can fix it.
Finally, if you want to develop your own programming language, take compilers in order to spare the world some accidental indecency.
There are two ways to earn a grade in this class: tests and projects.
Your grade is your tests grade or your projects grade, whichever is higher.
You do not have to take the final if you are satisfied with your project grade. In fact, you are encouraged to achieve your grade through the projects.
Exams will be impossible to finish in the alloted time.
You should answer those questions which you know best.
The weight of each question is proportional to the percentage of students that failed to answer it or answered it incorrectly.
The result will be curved.
I'm morally opposed to the way textbooks are priced.
There is no required textbook for the class.
Instead of a textbook, I maintain a technical blog with articles on topics covered in the class.
I'll link to applicable online material from here, as well as the slides and any code we use in class.
Some students request recommended textbooks.
There is no "Python compiler project" textbook, but many books cover some of the necessary concepts.
A good resource on compiling a dynamic language like Python is Lisp in Small Pieces.
An older (yet remarkably up-to-date) resource on compiling with continuation passing style is Compiling with Continuations. (This is a very advanced book and not recommended for struggling students.)
The classic compiler textbook is "The Dragon Book": Compilers: Principles, Techniques and Tools. Some of the techniques in this book work for dynamic languages like Python. Unfortunately, most of the techniques only work well for C-like typed languages.
match
quasiquote
For the projects, you are allowed to pair with a partrner and submit a single, joint assignment.
You may also work alone.
I encourage the formation of study groups to discuss the theory and algorithms behind the projects.
You may not share code with anyone other than your partner.
If you want to learn it, you have to implement it.
Violators of the collaboration policy will be dealt with under the University's academic misconduct policy, and they will fail the course.
I've caught so many attempts at cheating in my courses that I feel the need to provide a guide to students.
diff
a token stream of your assignment against others.
If you're going to cheat, you need to rewrite the assignment from scratch.
This semester, we'll be building a Python 3.4 compiler.
You can use any language you want to do the implementation.
If you want to receive a passing grade, Racket is recommended.
If you're bright and motivated, you might be able to get away with another functional language such as Scala, OCaml/SML, Clojure or Haskell.
A major theme in this course is the that functional programming is a multiplier: it makes you a significantly more productive programmer.
Anything else is a knife in a gunfight.
Projects will likely be along the following lines:
vulcan
to gain root access: +10% for a local user exploit; +15% for a remote exploit (e.g. breaking in via apache).
You must exploit a vulnerability (e.g. buffer overflow) for Ubuntu on vulcan
to gain root; that is, you can't steal my laptop while it has an open ssh
connection to vulcan
to claim the prize.
You must write up a short summary of the vulnerability and how you exploited it.
(You may use a prepackaged tool for exploitation.)
Mail the summary to me for approval and then to the class.
Each individual exploit may only be claimed once, and the first to exploit wins.
To signal that you have claimed root, modify the message of the day.
Released: Thursday, Jan 22.
Due: Friday, Feb 13, 5:00 PM Utah time
Your task is to construct a lexical analyzer for the Python programming language.
Python is a relatively simple language to tokenize, with the notable exception of its significant whitespace.
For reference, the lexical specification for Python is available as part of the official documentation.
vulcan
. pylex
,
as provided on vulcan
.
Other_ID_Start
and Other_ID_Continue
, and it says these properties are available in
PropList.txt
.
Please use this version of PropList.txt
.
\N{name}
.
Please use UnicodeData.txt as the canonical
list of names for unicode characters.
README.txt
file that contains:
make
should compile the program (if compilation is necessary).
It should produce an executable file called pylex
in the current directory.
make run
should accept input on STDIN
and send
output to STDOUT
.
make test
should run any test cases you used.(NEWLINE)
-- for a logical newline.
(INDENT)
-- for a logical increase in indentation.
(DEDENT)
-- for a logical decrease in indentation.
(ID name)
-- for
an identifier, where name
is a
Racket-compatible literal string that has been NFKC-normalized.
foo
becomes (ID "foo")
(LIT value)
-- for a literal value,
where value
is a
Racket-compatible
literal value.
(If it's a string, it must be properly escaped.).
10
becomes (LIT 10)
"foo ÿ bar"
may become (LIT "foo ÿ bar")
or (LIT "foo \xFF bar")
.
(KEYWORD symbol)
-- for
a keyword, where symbol
is a
Racket-compatible literal symbol.
def
becomes (KEYWORD def)
(PUNCT text)
-- for operators and
delimiters, where text
is a Racket
string containing the text of the operator or delimiter.
+
becomes (PUNCT "+")
(ENDMARKER)
-- for the end of the input.
"\uD800"
is allowed to crash the lexer. For
clarity, the range of surrogates is D800 to DFFF. (Racket should crash
automatically on these characters, since Racket disallows unpaired
surrogates in strings.) ...
as a delimiter and <>
as an operator.
/^(re)/
.
The special pattern ^
matches the start of the input (or in some cases, the start of a line).
# Comment 1 # Comment 2 # Factorial: def fact( x\ ): if x == -1: return 1.j elif x ==0: return 1 else: return x* fact(x - 1) s = "foo\ \\ \n\'\"" fact(20)should produce:
(KEYWORD def) (ID "fact") (PUNCT "(") (ID "x") (PUNCT ")") (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD if) (ID "x") (PUNCT "==") (PUNCT "-") (LIT 1) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT +1.i) (NEWLINE) (DEDENT) (KEYWORD elif) (ID "x") (PUNCT "==") (LIT 0) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (LIT 1) (NEWLINE) (DEDENT) (KEYWORD else) (PUNCT ":") (NEWLINE) (INDENT) (KEYWORD return) (ID "x") (PUNCT "*") (ID "fact") (PUNCT "(") (ID "x") (PUNCT "-") (LIT 1) (PUNCT ")") (NEWLINE) (DEDENT) (DEDENT) (ID "s") (PUNCT "=") (LIT "foo\\ \n\'\"") (NEWLINE) (ID "fact") (PUNCT "(") (LIT 20) (PUNCT ")") (NEWLINE) (ENDMARKER)
A reference implementation is available on
vulcan
as pylex
.
Copy your project to /home/unid/pylex/
on vulcan.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pylex make make run < test1.py > test1.py.out make run < test2.py > test2.py.out make run < test3.py > test3.py.out ...
or with:
cd /home/unid/pylex make ./pylex < test1.py > test1.py.out ./pylex < test2.py > test2.py.out ./pylex < test3.py > test3.py.out ...
Due: Friday, March 6, 5:00 PM Utah time
Your task is to construct a parser for the Python programming language.
Compared to other programming languages, Python is straightforward to parse.
In fact, it is one of the few popular programming languages with an LL(1) grammar, making it amenable to hand-written recursive descent parser.
Of course, it is also possible (and recommended) to use a parsing tool.
file_input
.
pylex
is in the current PATH. pysx
.README.txt
file that contains:
make
should compile the program to produce an executable called pyparse
.make test
should run any test cases you used.None.
$ pylex | pyparse
my-python-grammar.sx
file generated by the
stub code.
(You just add derp-style reductions to the rules in this file.)
pylex
accepts a --exp
flag
that does not emit the NEWLINE
or ENDMARKER
.
To target subsets, change file_input
in
pyparse.rkt
to the name of the production you want to target.
Copy your project to /home/unid/pyparse/
on vulcan.ucombinator.org
.
It should be possible to run your code with:
cd /home/unid/pyparse make pylex < test1.py | ./pyparse > test1.py.out pylex < test2.py | ./pyparse > test2.py.out pylex < test3.py | ./pyparse > test3.py.out ...
Due: Tue March 31 11:59 PM Utah time
You will implement a set of desugarings from Python into Python.
The two major desugarings will be *-elimination in assignment forms and class elimination.
$HOME/project-python-desugar-1
.
pylex
is in the current PATH. pyparse
is in the current PATH, and that it matches the
behavior of the reference app by consuming a tokenized input on STDIN and
printing the AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept an S-Expressified Python AST on
STDIN
and send output to STDOUT
.make test
should run any test cases you used.print
.
print(3)
and work up from there.
Examine the output of the reference implementation on these small programs
to unravel its logic.
Due: Fri May 3 11:59 PM Utah time (Extended)
Normalization fixes the order of execution and simplifies every statement down to one "action," bringing it closer to the feel of the eventual assembly code.
Normalization is also required in Python in order to access a larger class of desugarings.
$HOME/project-python-pydesugar-2
.
pylex
is in the current PATH. pyparse
is in the current PATH, and that it matches the
behavior of the reference app by consuming a tokenized input on STDIN and
printing the AST on stdout. README.txt
file that contains:
make
should compile the program (if compilation is necessary).make run
should accept an S-Expressified Python AST on
STDIN
and send output to STDOUT
.make test
should run any test cases you used.print
. print(3)
and work up from there.
Examine the output of the reference implementation on these small programs
to unravel its logic.