Related pages:
The library works by building the syntax for regular expressions out of Java syntax.
A regular expression pattern r can be:
- A literal string:
"string" - A literal character:
'char' - A sequence of patterns:
s(r1,...,r2) - A choice of patterns:
or(r1,...,r2) - A zero-or-more repetition of a pattern:
rep(r)
So a regular expression like
(foo|bar|baz)*xoo
would be
s(rep(or("foo","bar","baz")),"xoo")
in the embedded syntax.
The library works by converting each regular expression into an NFA on the fly.
For whatever reason, I didn’t implement it in a functional fashion, so you can’t re-use a sub-expression in more than one regular expression.
That is, the following breaks:
NFA foo = s("foo") ;
NFA pattern = s(foo,foo) ;
but
NFA pattern = s(s("foo"),s("foo")) ; // or s("foo","foo")
is just fine.
I don’t code much in Java these days, but if I still did, I’d make it purely functional, so that sub-expressions could be re-used.
The matching algorithm works by doing a straightforward graph search over a string.
The complexity is exponential in theory, but very fast in practice.
Code
NFA library: NFA.java
Twitter: @mattmight
Instagram: @mattmight
LinkedIn: matthewmight
Mastodon: @mattmight@mathstodon.xyz
Sub-reddit: /r/mattmight