Scripting Language Design and Implementation

Instructor    Matt Might
Time Tues & Thurs, 2:00pm - 3:20pm
Location WEB 1250
Appts. Office hours after each lecture, or when my door is open.
Group utah-sldi-spring-2012
Blog blog.might.net (I'll post lecture notes here.)

Navigation

Requesting permission

I'm happy to grant permission to take the course if you read the course description and send me an email.

(I didn't want anyone to sign up by accident, since this course requires more work than normal.)

Description

This course has two major co-themes:

The design theme will include a hands-on survey of scripting languages.

New languages will be introduced at a rate of one to two weeks, and assignments will allow students to experience their strengths and weaknesses by programming in them.

For the implementation theme, students will be required to implement an interpreter for a scripting language from scratch.

An understanding of models of computation is a critical prerequisite.

This course is much more difficult than the average computer science course, and may require double or triple the effort.

The time invested will be well worth it: the course will impart a deep, unified understanding of programming languages, and a fundamental appreciation for the nature of computation.

Students will learn how to quickly teach themselves and exploit new programming lanuages.

Languages covered

Topics

Schedule

Grading

Your final grade is the maximum of your projects grade and your grade on the exams.

You do not have to take the final if you are satisfied with your grade on the projects.

Projects

The first five projects count for 10% each and the final project counts for 50%.

Midterm exam

There will be a midterm exam covering concepts thus far in the course.

Everyone is required to take it.

It counts for 50% of the exam grade.

Final exam

The final exam will be impossible to finish in the alloted time.

It will be divided into two portions, each scored separately.

The first portion will cover the same material as the midterm. If you were satisfied with your midterm grade, skip this portion. If you score higher on this portion than on your midterm, it replaces your midterm grade.

The second portion will be on post-midterm material, and it will account for 50% of your final grade.

You should answer those questions which you know best.

The weight of each question is proportional to the percentage of students that failed to answer it or answered it incorrectly.

The result will be curved.

Material

I'm morally opposed to the way textbooks are priced.

There is no required textbook for the class.

A textbook also cuts against the spirit of this class, which is ultimately about teaching you how to teach yourself a new programming language.

When it comes to learning a programming language, there is a wealth of material available through Google. I will teach you how to find it!

If you'd like recommended (but not required) textbooks, try the following:

I'll link to applicable online material from here, as well as my notes, the slides and any code we use in class.

Preliminaries and prerequisites

Unix

JavaScript

Lambda calculus

Racket

Python

Lexical analysis

Parsing

Transformation

Collaboration policy

I encourage the formation of study groups to discuss the theory and algorithms behind the projects.

You may not share code.

If you want to learn it, you have to implement it.

Violators of this rule will automatically fail the course and be dealt with under the School and University academic misconduct policy.

Projects

These projects will be fleshed out in more detail before they're formally assigned. Take them as draft "project ideas" for now.

Depending on how the course evolves and student interest, I may change a later project to a different one entirely.

Project 1: Unix scripting

Due: Friday, Feb 3 2012, before midnight Utah time

In this project, you will construct scripts to do the following:

  1. Find the five folders in the current directory consuming the most space.
  2. Report duplicate MP3s (by file contents, not file name) on a computer.
  3. Take a list of names whose first and last names have been lower-cased, and properly recapitalize them.
  4. Find all words in English that have x as their second letter, and n as their second-to-last.
  5. Apply rot13 encoding/decoding to standard in.
  6. Replace all spaces in a filename with underscore for a given directory.
  7. Computes the lines common to two files.
  8. Generates a password hash file from a dictionary.
  9. Attempts to crack a password database whose passwords were hashed but not salted using the output of the previous program.
  10. Validates a username and password against a database whose passwords have been properly salted and hashed.

Recommended tools: bash or zsh, md5sum, diff, sort, uniq, cut, sed, awk, du.

To get the assignment spec, ssh into winterfell.ucombinator.org.

Click here request an account.

Once logged in, run get-p1 to git clone the stub project into $HOME/p1.

Complete the scripts provided in the scripts.

Run make test to run a simple batch of tests.

To turn in your project, have the most recent version of your project in $HOME/p1 when the due date hits.

My turn-in script will pull all repositories at that time.

Project 2: Implementing regex

Due: Friday, 2 March 2012, 11:59:59pm.

In this project, you will parse and evaluate regular expressions.

The recommended technique for this is Brzozowski's derivative.

You will be implementing this project in JavaScript.

There is a template/demo for this project available.

The EBNF grammar of regular expressions is:

    
    regex  ::= term { termop term }
    termop ::= '|' | '&'
    term   ::= { factor }
    factor ::= base { '*' }
    base   ::= [ '~' ] atom
    atom   ::= '\' char
            |  char
            |  '(' regex ')'
   

In this dialect, the operator ~ is complement (or negation) and the operator & is intersection (or and).

Requirements

To complete the assignment, modify the template to use your own regular expression implementation instead of the internal JavaScript implementation.

Matt will grade the assignment with a manual read of your code plus a suite of test inputs.

To aid the suite testing, your page should export the function DoesMatch(regex,string), returning true for an exact match and false otherwise.

Turnin

To turn in this project, submit a URL from your public html directory on winterfell to this file.

To prevent your assignment from being copied, place it it the subdirectory ~/public_html/p2/

Then, disable public reads of this directory, while allowing public execution:

    chmod o-r $HOME/public_html/p2
    chmod o+x $HOME/public_html
   

Directories with incorrect permissions set will receive an automatic 0.

Rename template.html to a random filename (ending in .html), e.g., 1701.html

The filename must have at least 20 characters.

Then, email Matt the link:

    http://winterfell.ucombinator.org/~username/p2/num.html
   

Pointers

For an implementation of regular expressions with derivatives in Java, see the slides on lexing.

For an implementation in Scheme, with an explanation of the theory, see this post.

The recursive descent parsing examples from class are also available.

Project 3: Computer algebra

Due: Monday, 9 April 11:59:59 PM Utah time.

In this project, you will implement basic computer algebra operations in Racket over the following expression language:

 <exp> ::= <num>
        |  <var>
        |  (+ <exp> ...)             [addition]
        |  (* <exp> ...)             [multiplication]
        |  (- <exp> <exp>)           [subtraction]
        |  (- <exp>)                 [negation]
        |  (/ <exp> <exp>)           [division]
        |  (^ <exp> <exp>)           [exponentiation]
   

Your task is to complete the stubbed out file alg-stub.rkt so that it meets its specification in the comments.

A reference implementation is also available on winterfell as the command alg.

To run it, use:

    $ alg command [args]
   

And then enter your expression on stdin.

The available commands are:

    eliminate-sub 
    binary-normal-form
    expand
    collapse
    sort-args
    poly-normalize
    simplify
    crush
    derive var
   

To turn in the assignment, leave it in $HOME/project3/alg-turnin.rkt on winterfell.

Running the stub file without any arguments will run 20 test cases.

(Set permissions appropriately to prevent classmates from seeing the file.)

You may want to reference lambda.rkt, a reduction-based interpreter for three languages. In particular, the way it desugars let using three separate functions is a good design pattern for this project.

Pay attention to the provided code. In particular, the function map-exp-bu can spare you having to write recursive tree walks. You can supply it a function that applies local rewrites, and it will apply them to the entire tree!

Also, you can phrase many operations much more simply in terms of repeatedly applying an operation to the whole tree until it reaches a stable value.

Final Project: Implementing an interpreter

Due: April 26 AoE

On winterfell, implement an interpreter for MicroScheme.

MicroScheme has the following grammar:

<prog> ::= <defs> <exp>

<defs> ::= <def> <defs>
        |  

<def> ::= (define (<var> <formals>) <exp>)

<formals> ::= <var> <formals>
           |  

<exp> ::= <integer>
       |  <var>
       |  #t | #f
       |  (lambda (<formals>) <exp>)
       |  (let ([<var> <exp>] ...) <exp>)
       |  (if <exp> <exp> <exp>)
       |  (set! <var> <exp>)
       |  (begin <exp> ...)
       |  (<prim> <exp> ...)
       |  (<exp> <exp> ...)
       |  (call/cc <exp>)

<prim> ::= +  |  -  |  *  |  = 
        |  void   

Your interpreter should execute the script on stdin and print the final result.

On winterfell, running /home/username/final/bin/microscheme should execute your interpreter.

Your source code should be in /home/username/final/src/

Tips

There are a variety of implementation strategies available.

You could construct a CESK machine directly for the language itself.

You could also desugar and/or a-normalize the language, which will allow you to construct a much simpler CESK machine.

You could also CPS transform the language, which allows you to build a substantially simpler CES machine.

I strongly recommend that you use Racket (or some dialect of Scheme) for this project, since the (read) command handle lexing and parsing for you.

You may find the helper code provided in microscheme.rkt useful.

Restrictions

You may not use the eval command in any Scheme distribution, including Racket.