-------------- Reviews on submission 104 ------------- -------------------- review 1 -------------------- PAPER: 104 TITLE: Parsing with derivatives OVERALL RATING: 4 (OK paper, but I will not champion it.) REVIEWER'S CONFIDENCE: 2 (medium) ----------------------- REVIEW -------------------- MAIN PART The paper describes an original approach to the derivation of parsers from grammars. The approach is based on the observation that context-free grammars are closed under Brzozowski derivatives, and such derivatives can be easily computed. The parser described hence parses a string by lazily computing the derivatives of the grammar for each character met. Quite surprisingly, such a parser seems very easy to encode, especially in a language that supports streams and lazy computations, can parse every grammar, and is quite efficient. I found the paper very interesting. The studied problem is very important, and is surprising to meet original results in a field that has been explored so much. I enjoyed the first pages of the paper very much, they are extremely crisp and clear. Unfortunately, around page 8, the paper starts getting more and more obscure; chapter 7 end abruptly, and chapter 8, which is probably the most important one, is quite obscure. At the end I still think that the paper should be accepted, but I really hope that the authors will be able to beef section 8 up to the point of making it readable for me. Detailed remarks for the authors only Example 1: the parse tree is: => the pasre tree of ' () ' is; In general: you should write \epsilon> rather then \epsilon> (this is not quantum theory, after all...) The non-deterministic state machine: the language in \Sigma is a language over A', not over A, right? You should say that explicitly. In the rules for bra's and ket's, the consequences should look like (L,w) =>^{\bra{n}} (D_{\bra{n}}(L),w) I think you forgot the derivative, right? The parse-string machine pushed a bra when it meets it, but never pops it. You can easily adjust the last rule of section 7 to pop it, but the real question is: why do you push it? The end of section 7 is too abrupt. You have described a nice pair of abstract machines, but now you should tell me something a bit more concrete about the amount of nondetermins involved, how much would that hurt me in practice... Page 10: the monotonicity for parser combinators should look like: forall p in \tau_X(w) exists p' in \tau_X(w') such that first p prefix of first p' Section 8 is obscure. It is too terse. The equations look right and look convincing, but I was totally unable to create a mental picture about how this machinery actually work. I understand that such a combination of non-determinism and lazyness creates code that tends to work in a mysterious way, but still the section is extremely unsatisfactory, especially. Section 9: a bit unreadable, at least for a non-scala programmer. PROS AND CONS + interesting solution for an interesting problem + elegant + innovative + first 8 pages are a pleasure to read - crucial sections are almost unreadable REBUTTAL -------------------- review 2 -------------------- PAPER: 104 TITLE: Parsing with derivatives OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.) REVIEWER'S CONFIDENCE: 4 (expert) ----------------------- REVIEW -------------------- MAIN PART The authors describe how one can use Brzozowski-style derivatives to implement parser combinators. They claim that their technique leads to simple implementations, and this seems to be the case. They also claim that their implementation is efficient, but give no solid evidence for this (they added a special combinator to get decent numbers from their single reported experiment). Another claim is that their parser combinators can handle left recursion, but they give no evidence for this. The authors seem to be unaware that Danielsson presented essentially the same technique at TYPES 2009 ("Mixing Induction and Coinduction"). However, Danielsson's draft paper does not try to sell the technique, only using it as a vehicle to implement provably total parser combinators, and he does not address efficiency of parsing, so there could be a place for both papers. The authors seem to claim that their implementation is a lot more efficient than other parser combinator libraries: "[...] they are easy to implement, and easy to implement as libraries (like top-down parsers), yet they are efficient and expressive (like bottom-up parsers)." However, they have not made any head-to-head comparisons, and they have not cited a single paper discussing efficient parser combinators. Generally there is a serious lack of references to related work on parser combinators. Section 7 discusses another parsing technique (also based on derivatives). However, I found this section to be poorly written and hard to follow. I suggest that it should be removed. If the claims about efficiency had been followed up with substantial evidence, then I would have recommended that the paper should be published. As it stands I do not make such a recommendation. Detailed comments: Section 2: "This technique was lost to the “sands of time” until [paper from 2009]" A quick web search will tell you that a large number of papers published before 2009 have used Brzozowski derivatives (or generalisations), so this comment is incorrect. See for instance the work of Rutten, including "Automata and Coinduction; an exercise in coalgebra" and "Automata, Power Series, and Coinduction: Taking Input Derivatives Seriously". Section 6: The expansion of D_x(List) would be easier to follow if you added a step between the second and third lines. "the null string terminal" and "the empty nonterminal" may not exist in N, so they may have to be added to N'. One could be led to believe that the last "and:" only applies when m = 0. "The empty nonterminal arises frequently with derivatives, and if this reduction is applied aggressively, the grammar will remain a manageable size." You give no proper motivation for this statement. In fact, your discussion of the "special closure construct" in Section 9 seems to imply that this statement is false. I suggest that you remove this statement. Section 7: The machine in this section does not make sense to me. First of all you do not state what the machine should accomplish; my guess is that you intend ((L, s), <>) ->* ((L', <>), ) to hold iff t is a parse tree which witnesses the membership of s in L. Furthermore you give no reason (such as a proof) to believe that the machine satisfies this specification. For instance, you push nonterminals onto the stack, but you never remove them (I suspect that the last rule should remove a nonterminal), and it seems strange that the state is unchanged in the rules involving "first". I think the paper would be better if Section 7, which is not central to the paper, was omitted. "This approach is not well suited to typed languages, since a literal implementation of the parsing machine must use a generic parse-tree type, or else violate type-safety with casts for reductions." I suspect that you could do better with a dependently typed language, so I suggest that you remove this sentence. Example 1: "the parse tree is:" I think you mean "one parse tree is:". "The parsing machine demonstrates that, even without reduction rules associated with the grammar, we can use derivatives to construct a concrete syntax tree." I do not follow this sentence. What kind of reduction rules are you referring to? Section 8: The parser combinators that you give seem quite outdated to me. The functional programming community moved away from this set of combinators more than a decade ago, in favour of combinators based on monads or applicative functors. Please motivate this choice. You state two properties that parser combinators must satisfy, but you never really make any use of these properties. Are these properties necessary for your development? Is your definition of a parser combinator a function which satisfies these properties? "A parser combinator is monotonic iff..." I assume that the second instance of \sqsubseteq should be read pointwise. "the meaning of an input expression w <- A^∗ is contained within tau_N(w)" In fact it is the one and only (complete) result, right? "D_c(c) = eps -> \w.{(c, w)}" This does not make sense, the second argument of -> is not a parser. Section 9: Your parsers accept infinite streams as input. However, parseFull clearly fails to terminate for infinite input. Why do you not restrict the input to finite lists? Your parsers return potentially infinite streams of results. However, your parse function uses "append", which (essentially) throws away its second argument if the first is infinite. Wouldn't it be better to return a fair interleaving of the two streams? In internalDerive you seem to have renamed a and b to first and second, respectively. "the longest possible parse first" Why is this relevant for performance? If you return all possible parses, then the order in which they are returned should not matter, and for unambiguous parsers at most one result is returned anyway. I can imagine that your optimisation matters in the (atypical?) case of an ambiguous parse, if you are not interested in inspecting all the results, but only want to see that there is more than one. The sentence before the table seems to be unfinished. I assume that "43 m" should be "43 ms". "These results suggest that derivative-based parser combinators scale well enough to be useful in practice." To me this result suggests that I may have to repeatedly add new special-case combinators (like your closure operation) to ensure decent performance. Section 10: "derivative-based parsing defies the classical taxonomies", "Top-down methods", "back-tracking"... Are you not aware of parser combinator libraries which use breadth-first techniques (which your combinators also do)? See, for instance, Claessen's "Parallel Parsing Processes". "these methods have not been able to handle left-recursive grammars" You should check out Lickman's MSc thesis "Parsing With Fixed Points" and "Parser Combinators for Ambiguous Left-Recursive Grammars" by Frost et al. "Philosophically, neither derivative-based method “feels” like a top-down parsing method." To me your combinator implementation does feel like a (breadth-first) top-down method. You claim (implicitly) that your parser combinators can handle left recursive grammars. However, I doubt that you can handle grammars of the form "n ::= n". If you can handle such grammars, please explain how you do it. Otherwise it would be interesting to know exactly what kind of left recursion you support. "they are efficient and expressive (like bottom-up parsers)" This seems to imply that there is a language which can be parsed using bottom-up techniques, but not using top-down techniques. Can you verify this claim? You may want to cite Frost and Szydlowski, who claim worst-case cubic time for parsing ("Memoizing purely functional top-down backtracking language processors"). Swierstra's work on efficient parser combinators is also relevant (many papers). References: Improve formatting. PROS AND CONS + Simple technique for implementing parser combinators. - The technique has been presented before. - No substantial evidence for claims of efficiency. - Section 7 does not make sense to me (but could be removed). REBUTTAL -------------------- review 3 -------------------- PAPER: 104 TITLE: Parsing with derivatives OVERALL RATING: 2 (Weak paper, though I will not fight strongly against it.) REVIEWER'S CONFIDENCE: 2 (medium) ----------------------- REVIEW -------------------- MAIN PART The paper proposes a new parsing technique based on "derivatives", inspired by the derivatives of regular expressions and extended to the derivatives of context free grammars. Because context free grammars may be ambiguous, they propose to compute the set of all parse trees but using lazy evaluation to share and delay the computation. They present their framework and an implementation in Scala. They claim that their technique is efficient in practice. EVALUATION Parsing techniques are old and well-established, although the current status of the theory is still unsatisfactory: grammars are still a burden to write and any advance of this topic is welcomed. However, the topic must be addressed very seriously. Given this, this work is potentially interesting as a possible way to improve the state of the art situation, but not in its current presentation, as the paper should not just _claim_ that using derivatives for parsing is feasible, but _demonstrate_ that writing grammars with combinators is at least both as convenient and as efficient as using existing technology. The first aspect is not discussed at all in the paper. As for efficiency, which is claimed, the benchmarks are really small, and perhaps probably representative of existing grammars, and the results are not compared with other parsing techniques. Other questions arise: what happens when the grammar is largely ambiguous? Do you intend to reject those, or pick the meaning of the things that you are parsing at random? The overall structure of the paper is not very clear either. You should gives us a road map at the beginning and make it clear how the different sections fits together. OTHER COMMENTS Section 7 and 8 are the key sections of the paper to understand your proposal. However, I have problems with both. I don't understand while your parsing is decomposed into production of an intermediate parsing string that is then consumed by the machine that produces a syntax tree. Since the parsing string is a linear representation of the parsing tree, why can't you work with the parsing tree itself, which is more structured than the parsing string? Why can't you directly extract a syntax tree from the parsing tree? Could you run the algorithm in Section 7 on an example? In section 8, you compute the derivatives of parser combinators, but you could instead compute the grammar defined by the parser combinator and compute its derivative in one step. What is this bad/worse than computing it directly? - One line above section 2: "could [to<-] modify" - Section 2, Line 10: "derivative---[they<-??] critical property" - One line above Section 7: "remain [<-of] a manageable size. - Computation of the derivative, above theorem 1: You could explain the derivatives, especially for rules n-> s1..sn and perhaps give an example. - Last two lines above section 8: "The parsing machine demonstrates... a concrete syntax tree". I don't really understand what you (intend to) mean. - Section 8.1, Line 1: "ability [the<-to] combine them" - End of section 8.1: you should say and comment on the fact that these definitions are recursive. - First line of section 8.4: "to be able [<-to] parse" - Section 8.4: How do you use these inequations? Why do you give this list of formulas without any comment? - Section 8.5: what are the lambda-terms on the right of --> ? -------------------- review 4 -------------------- PAPER: 104 TITLE: Parsing with derivatives OVERALL RATING: 1 (Serious problems. I will argue to reject this paper.) REVIEWER'S CONFIDENCE: 3 (high) ----------------------- REVIEW -------------------- MAIN PART SUMMARY The paper proposes a basic algorithm for determining whether a sentence s is a member of a context-free language L: (i) if the sentence is empty, determine whether L contains the empty string (a decidable problem); (ii) if the sentence is of the form c.s, determine whether the suffix s is a member of the derivative c^{-1}.L. Because the derivative of a context-free language can be effectively constructed, this is an algorithm. The contribution of the paper is in generalizing this basic algorithm in two ways: (i) instead of just determining membership, the parsing algorithm can be made to return a concrete syntax tree, or a semantic value; (ii) instead of viewing the derivative as an operation that accepts a monolithic grammar and produces a monolithic grammar, one can view it as an operation that accepts a parser combinator and produces a parser combinator; this means that the parsing algorithm is amenable to a modular, combinator-based implementation. EVALUATION The introduction of the paper is somewhat irritating, for two reasons. First, it promises a lot ("simplicity, feasibility, and ubiquity"), which the paper does not clearly deliver. Second, it criticizes the most popular parsing methods (LL and LR) on the grounds that the constraints that they impose, in order to rule out ambiguous grammars, are difficult to grasp. I can accept this criticism. But the authors do not offer any solution to the ambiguity problem: their algorithm produces a parse forest. This leaves the programmer faced with the task of writing and debugging disambiguation filters, an error prone task. At least, an LR parser generator is able (in principle, and some implementations actually do this) to accurately explain why the grammar seems ambiguous. It can produce a sentence that is a prefix of the fringes of two distinct, valid parse trees. The first claimed contribution of the paper is the remark that the derivative of a context free language can be effectively constructed. (A proof appears in section 6.) This is in fact not a contribution. This property of context free languages is well-known, at least in the formal language community (perhaps not in the programming language community). The derivative of a formal language is also known as its left quotient, and written c^{-1}L. See, for instance, the following paper, which exploits this property intensively: Jean Berstel and Luc Boasson. Towards an algebraic theory of context-free languages. Fundamentae Informaticae, 25(3):217-239, 1996. The remainder of the paper contains two distinct developments. In section 7, a version of the derivative-based parsing algorithm is developed, which produces concrete syntax trees. Then, in sections 8 and 9, another version is developed, which produces programmer-defined semantic values, and is combinator-based. It is not clear to me why these two distinct developments co-exist. In my eyes, the approach of sections 8 and 9 seems superior, because it is more flexible (it is not limited to producing concrete syntax trees) and modular. The idea that the notion of a derivative can be generalized to parser combinators is interesting, and worthy of thought. To some extent, section 8.2 is where the paper really begins! Section 8 provides a mathematical view of the problem, while section 9 describes part of a Scala implementation. This material is interesting, but can be criticized on several grounds: 1. The complexity analysis of the algorithm is absent. The algorithm seems very expensive in the worst case: indeed, the effective construction of a derivative in section 6 causes a quadratic blow up of the size of the grammar, and this operation is iterated at every character of the input stream! In fact, section 9.1 mentions that a naive implementation of the algorithm is unusable (exponentially slow), and that special support for the Kleene star had to be added. However, there is no argument that this patch is sufficient for the algorithm to become useable. A worst-case analysis of the space and time complexity is needed, and, if this worst-case complexity is bad (say, exponential), then a formal or semi-formal argument is needed why the behavior of the algorithm remains tolerable in practice. In light of the efficiency claim found in the introduction, this is indispensable! 2. The description of the implementation could be improved. In particular, memoization and sharing seem to play a crucial role. It seems that two attempts to compute the derivative of a parser combinator should return the same object. This includes the particular case where the second attempt is in fact nested within the first one (i.e., it is a recursive call). For instance, if the grammar contains the single production L -> LM, a call to L.derive(t) will (indirectly) cause another call to L.derive(t). What happens then? By which mechanism is non-termination avoided? This is not clearly described. The paper contains four lines of text that seem to concern this topic (p.15, "It is important to note [...] immediately suspended."), but I find them obscure. I note that there seems to be a distinction between derive and internalDerive, which is not explained in the paper; my guess is that the former may be a memoizing version of the latter, using a table indexed by tokens. Is this the case? 3. The introduction claims "ubiquity", that is, if I understand correctly, it claims that the method is very easy to implement as a library. This may be true, but the paper lacks a convincing argument why this is so. In particular, how is this approach superior to the other parsing combinator libraries in existence? The paper lacks a good comparison with other combinator-based parsing techniques. It seems that there might be a connection between the technique presented in the paper and intersection parsing (see e.g. chapter 13 of the excellent book "Parsing techniques", by Grune and Jacobs). Computing the derivative of a context free language with respect to a terminal symbol c is closely related to computing its intersection with the regular language c(.*). Intersection parsing, as described by Grune and Jacobs, computes the intersection of the grammar with the input (which is a string, so a regular language!) and simplifies the resulting grammar, which can then be viewed as a parse tree. Could one view the approach presented in the paper as an incremental (and modular) way of doing intersection parsing? In conclusion, although the paper contains an interesting idea, its investigation is not sufficiently thorough, and a clear case in favor of this parsing technique is missing. DETAILED COMMENTS The authors state in section 7 that "a literal implementation [...] must use a generic parse-tree type, or else violate type safety with casts for reductions". Strictly speaking, this depends on which type system one has in mind. If the type system is ML, for instance, the claim is probably true. If the type system has GADTs, however, it is quite possible that the discipline of the parsing machine can be encoded in the types. This has been done, for an LR parsing machine, and for an arbitrary but fixed grammar, in the following paper: François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers. In ACM Workshop on ML, volume 148(2) of Electronic Notes in Theoretical Computer Science, pages 155-180, 2006. To the best of my knowledge, both Haskell and Scala have GADTs, so the authors might wish to check this out. p.5, "for any class of decidable languages closed under the derivative, derivatives may be used to determine membership". I am not sure what is meant by "decidable" here. If it is meant that membership in the language is decidable, then this is a tautology. Perhaps one should understand that membership *of the empty word* in the language is decidable, since effectively that is all that is needed? p.5, "the derivative is closed under regular operations": unclear formulation. p.7, before theorem 1: please indicate the interval over which the index i ranges. My guess is 0 \geq i < m. The particular case where m = 0 seems redundant: when m = 0, there is no suitable index i, so no production is generated, which means D_c(n) is empty. In fact, the notation \emptyset in the right-hand side of a production does not make sense, since the right-hand side of a production is supposed to be a sequence of symbols. p.7, "the empty nonterminal": this has not been defined. I assume this explains the use of \emptyset above. Define it. p.8, example 1: "for the grammar [...] of balanced parentheses [...] the parse tree is": I assume this should be: "the parse tree for the string '()'". p.8, "A parse string is [...] containing both nonterminal markers and reduction markers"." Please indicate at this point that a parse string also contains elements of the alphabet A. In fact, the definition of the alphabet A' should be moved to this point. p.8, "we construct a new grammar [...] over parse strings": please change this to "a new grammar over the alphabet A'". p.8, the definition of \Sigma mentions \mathbb{L}: I guess this should be \mathbb{L}_{A'}. p.9, "a second rule for producing bras": is the rule correct? In the right-hand side of the conclusion, instead of L, I would have expected the derivative of L with respect to "43 ms" p.16, "surpising" p.17, "they extent" PROS AND CONS + interesting and possibly novel idea - no time/space complexity analysis - implementation not clearly described - case in favor of the idea not clearly made - comparison with related work should be more thorough REBUTTAL No specific questions.