derp documentation

Download and use

To use this module in Racket, download the code, place it in the same directory as your code, and then require it:

    (require "derivative-parsers.rkt")

Warnings

This version of the code still uses fixed points internally. David Darais found a way to eliminate fixed points, which substantially speeds up parsing. derp will eliminate fixed points internally soon, but the performance-conscious may want to wait until then.

Documentation

A language is a first-class value with derp.

The macro lang creates a language.

Atomic languages

Within lang there are several atomic subforms:

(eps): the null string.
(empty): the empty language.
(? pred class): the language { t : pred(t) }.

To make it easier to write terminals (token classes) in the grammar, several literal forms are available:

'c: expands to (literal->language 'c)
string: expands to (literal->language string)
number: expands to (literal->language number)
boolean: expands to (literal->language boolean)

The procedure literal->language can be set with set-literal->language!.

Compound languages

There are a few core compound language subforms:

(or L1 ...): the union of several languages.
(seq L1 ...): the concatenation of several languages.
(rep L): the possibly empty repetition of a language.
(red L proc): applies proc to each parse of L

When the (seq ...) subform produces a parse tree, it conses their results.

When the (rep ...) subform parses into a proper list of matches.

Meta-syncatic sugar

There are several non-core forms that expand into core forms; these are convenient for parsing:

(eps value): a null language that produces value as a parse tree if matched.
(seq* L1 ...): produces a proper list of sub-parses.
(seq! qq-lang): like seq*, but only saves sub-languages marked with a quasiquote.
(rep+ L): like rep, but for non-empty repetition.
(opt L default): optionally parses the language L or returns default.
(car L): same as (red L (λ (t) (car t))).
(--> L f): same as (red L f).
(@--> L f): same as (red L (λ (t) (apply f t)).
(>--> L c ...): same as (red L (λ (t) (match t c ...)).
($--> L e ...): parses into the value of (begin e ...), with $$ bound to the match of L and ($ n) returning the nth member of $$.

Parsing interface

To parse a stream with respect to a language L, use (parse L stream). Streams are available in srfi/41.

Example

To parse (lexed) comma-separated values:

(define SYMBOL (lang (? symbol? 'symbol)))

(define csv (lang (or (@--> (seq! `SYMBOL "," `csv) cons)
                      ($--> SYMBOL (list $$)))))

(parse csv (stream 'foo "," 'bar "," 'baz))
; => (set '(foo bar baz))