Implementing regular expression pattern-matching and NFAs in Java

Back when I was a grad student, I gave a guest lecture on regular expressions.

I gave out a toy Java library that showed how to implement regular expression pattern-matching in Java using NFA-conversion and back-tracking search.

Related pages:

The library works by building the syntax for regular expressions out of Java syntax.

A regular expression pattern r can be:

So a regular expression like

(foo|bar|baz)*xoo

would be

s(rep(or("foo","bar","baz")),"xoo")

in the embedded syntax.

The library works by converting each regular expression into an NFA on the fly.

For whatever reason, I didn’t implement it in a functional fashion, so you can’t re-use a sub-expression in more than one regular expression.

That is, the following breaks:

 NFA foo = s("foo") ;
 NFA pattern = s(foo,foo) ;

but

 NFA pattern = s(s("foo"),s("foo")) ; // or s("foo","foo")

is just fine.

I don’t code much in Java these days, but if I still did, I’d make it purely functional, so that sub-expressions could be re-used.

The matching algorithm works by doing a straightforward graph search over a string.

The complexity is exponential in theory, but very fast in practice.

Code

NFA library: NFA.java