More resources
- My recommended reading for compilers and programming languages.
- My ongoing tutorial series for JavaScript.
- JavaScript: The Good Parts. Exactly that.
- JavaScript: The Definitive Guide. The best reference on JavaScript.
McLexer: Lexers for JavaScript
McLexer is a lexical analysis framework for JavaScript. It's not a lexer-generator like lex; rather it's a domain-specific language embedded in JavaScript that produces lexical analyzers dynamically. For speed, simplicity and flexibility, it exploits JavaScript's built-in regular-expression utilities. For performance and scalability, the framework enables stackless, continuation-passing lexers, since JavaScript doesn't support tail-call optimization.
Quick comparison to lex
With a tool like lex, you might have rules that look like:
<INIT> [_A-Za-z]+ { return(ID) ; } <INIT> [0-9]+ { return(NUM) ; }
In McLexer, these might look like:
INIT (/[_A-Za-z]+/) (function () { return ID ; }) ; INIT (/[0-9]+/) (function () { return NUM ; }) ;
Specifying lexers
A specification for a lexer starts by declaring the lexical
analysis states, each an instance of McLexer.State
.
For example:
var INIT = new McLexer.State() ; var IN_COMMENT = new McLexer.State() ;
Then, each lexical analysis state is associated with a list of rules, where each rule is a regular expression paired with an action.
Each rule declaration has the form:
state (regex) (action) ;
Informally, such a rule means, "While in the state state, if a prefix of the input matches the pattern regex, then call action."
The parameter action is a function of the form:
function (match,rest,state) { // match is an array produced from matching the prefix // rest is the remainder of the input // state is the current lexical analysis state }
To run a lexer against an input, each state has a run
method.
The run
method makes a single match against the input its given,
and then performs the action.
In most cases, lexers should not stop after a single match. To achieve this behavior, functional lexers tail-call the lexer after each match. JavaScript has no tail-call optimzation, so a lexer which uses tail calls would run out of stack space on large inputs. To get around this restriction, McLexer actions can return a continuation to continue lexing.
To support this continuation-based protocol, one can invoke the
lex
method on a state instead of the run
method.
The lex
method will repeatedly execute the
continuations returned by actions until null
comes back.
Every state provides a continuation
method that,
given the remainder of the input, produces a continuation that will
continue lexing once invoked.
As a result, a common pattern for actions is:
function (match,rest,state) { // Do something with match[0]. return state.continuation(rest) ; }
If the only behavior desired is to switch states, then the McCONTINUE
function
abbreviates this action:
McCONTINUE(newState) == function (match,rest,state) { return newState.continuation(rest) ; }
McLexer tokenizer example
McLexer source code
McHighlight: Syntax-highlighting
McHighlight is a framework for building syntax-highlighters on top
of McLexer.
A new McHighligter
object takes an initial lexical analysis
state, and it expects to have its methods called by the actions in the
state.
A new McHighligter
object provides convenience methods that produce actions.
These actions abbreviate the code required to, for example, wrap the
matched input in a span with a given class, or "harden" whitespace by
replacing each space with  
and each newline with
<br />
.
The highlight(element[,code])
method inserts highlighted
code into the body of the supplied element; if no code is supplied,
then the value of element.innerHTML
is used as the code to highlight.
To use a highlighter on your site:
- Make a copy of
mchighlight.js
,mchighlight.js
andmchighlight-language.js
on your site. - Include these files on your web page.
- Include CSS style tag or file that defines how to style the classes for the language.
- Call
Highlighter.highlight(element)
on apre
element that has code inside of it. - Report bugs to me!