To use this module in Racket, download the code, place it in the same directory as your code, and then require it:
(require "derivative-parsers.rkt")
This version of the code still uses fixed points internally. David Darais found a way to eliminate fixed points, which substantially speeds up parsing. derp will eliminate fixed points internally soon, but the performance-conscious may want to wait until then.
A language is a first-class value with derp.
The macro lang
creates a language.
Within lang
there are several atomic subforms:
(eps)
: the null string.(empty)
: the empty language.(? pred class)
: the language { t : pred(t) }.To make it easier to write terminals (token classes) in the grammar, several literal forms are available:
'c
: expands to (literal->language 'c)
string
: expands to (literal->language string)
number
: expands to (literal->language number)
boolean
: expands to (literal->language boolean)
The procedure literal->language
can be set with
set-literal->language!
.
There are a few core compound language subforms:
(or L1 ...)
: the union of several languages.(seq L1 ...)
: the concatenation of several languages.(rep L)
: the possibly empty repetition of a language.(red L proc)
: applies proc to each parse of L
When the (seq ...)
subform produces a parse tree, it cons
es their results.
When the (rep ...)
subform parses into a proper list of matches.
There are several non-core forms that expand into core forms; these are convenient for parsing:
(eps value)
: a null language that produces value as a parse tree if matched.(seq* L1 ...)
: produces a proper list of sub-parses.(seq! qq-lang)
: like seq*
, but only saves sub-languages marked with a quasiquote.(rep+ L)
: like rep
, but for non-empty repetition.(opt L default)
: optionally parses the language L or returns default.(car L)
: same as (red L (λ (t) (car t)))
.(--> L f)
: same as (red L f)
. (@--> L f)
: same as (red L (λ (t) (apply f t))
. (>--> L c ...)
: same as (red L (λ (t) (match t c ...))
. ($--> L e ...)
: parses into the value of (begin e ...)
, with $$
bound to the match of L and ($ n)
returning the nth member of $$
.
To parse a stream with respect to a language L, use (parse L stream)
.
Streams are available in srfi/41
.
To parse (lexed) comma-separated values:
(define SYMBOL (lang (? symbol? 'symbol))) (define csv (lang (or (@--> (seq! `SYMBOL "," `csv) cons) ($--> SYMBOL (list $$))))) (parse csv (stream 'foo "," 'bar "," 'baz)) ; => (set '(foo bar baz))