To use this module in Racket, download the code, place it in the same directory as your code, and then require it:
(require "derivative-parsers.rkt")
This version of the code still uses fixed points internally. David Darais found a way to eliminate fixed points, which substantially speeds up parsing. derp will eliminate fixed points internally soon, but the performance-conscious may want to wait until then.
A language is a first-class value with derp.
The macro lang creates a language.
Within lang there are several atomic subforms:
(eps): the null string.(empty): the empty language.(? pred class): the language { t : pred(t) }.To make it easier to write terminals (token classes) in the grammar, several literal forms are available:
'c: expands to (literal->language 'c)string: expands to (literal->language string)number: expands to (literal->language number)boolean: expands to (literal->language boolean)
The procedure literal->language can be set with
set-literal->language!.
There are a few core compound language subforms:
(or L1 ...): the union of several languages.(seq L1 ...): the concatenation of several languages.(rep L): the possibly empty repetition of a language.(red L proc): applies proc to each parse of L
When the (seq ...) subform produces a parse tree, it conses their results.
When the (rep ...) subform parses into a proper list of matches.
There are several non-core forms that expand into core forms; these are convenient for parsing:
(eps value): a null language that produces value as a parse tree if matched.(seq* L1 ...): produces a proper list of sub-parses.(seq! qq-lang): like seq*, but only saves sub-languages marked with a quasiquote.(rep+ L): like rep, but for non-empty repetition.(opt L default): optionally parses the language L or returns default.(car L): same as (red L (λ (t) (car t))).(--> L f): same as (red L f). (@--> L f): same as (red L (λ (t) (apply f t)). (>--> L c ...): same as (red L (λ (t) (match t c ...)). ($--> L e ...): parses into the value of (begin e ...), with $$
bound to the match of L and ($ n) returning the nth member of $$.
To parse a stream with respect to a language L, use (parse L stream).
Streams are available in srfi/41.
To parse (lexed) comma-separated values:
(define SYMBOL (lang (? symbol? 'symbol)))
(define csv (lang (or (@--> (seq! `SYMBOL "," `csv) cons)
($--> SYMBOL (list $$)))))
(parse csv (stream 'foo "," 'bar "," 'baz))
; => (set '(foo bar baz))