Niko Matsakis: Nice errors in LALRPOP |
For the last couple of weeks, my mornings have been occupied with a pretty serious revamping of LALRPOP’s error message output. I will probably wind up doing a series of blog posts about the internal details of how it works, but I wanted to write a little post to advertise this work.
Typically when you use an LR(1) parser generator, error messages tend
to be written in terms of the LR(1) state generation algorithm. They
use phrases like shift/reduce conflict
and talk about LR(1)
items. Ultimately, you have to do some clever thinking to relate the
error to your grammar, and then a bit more clever thinking to figure
out how you should adjust your grammar to make the problem go away.
While working on adapting the Rust grammar to LALRPOP, I
found I was wasting a lot of time trying to decrypt the error
messages, and I wanted to do something about it. This work
is the result.
An aside: It’s definitely worth citing Menhir as an inspiration, which is an awesome parser generator for OCaml. Menhir offers a lot of the same features that LALRPOP does, and in particular generates errors very similar to those I am talking about here.
What I’ve tried to do now in LALRPOP is to do that clever thinking for you, and instead present the error message in terms of your grammar. Perhaps even more importantly, I’ve also tried to identify common beginner problems and suggest solutions. Naturally this is a work-in-progress, but I’m already pretty excited with the current status, so I wanted to write up some examples of it in action.
Let’s start with an example of a truly ambiguous grammar. Imagine that I have this grammar for a simple calculator (in LALRPOP syntax, which I hope will be mostly self explanatory):
1 2 3 4 5 6 7 8 9 |
|
This grammar evaluates expressions like 1 + 2 * 3
and yields a
32-bit integer as the result. The problem is that this grammar is
quite ambiguous: it does not encode the precedence of the various
operators in any particular way. The older versions of LALRPOP gave
you a rather opaque error concerning shift/reduce conflicts. As of version
0.10, though, you get this (the actual output even uses ANSI colors,
if available):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Much clearer, I’d say! And note, if you look at the last sentence, that LALRPOP is even able to diagnose that this an ambiguity specifically about precedence and refer you to the manual – now, if only I’d written the LALRPOP manual, we’d be all set.
I should mention that LALRPOP also reports several other errors, all of which are related to the precedence. For example, it will also report:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
The code for detecting precedence errors however doesn’t consider
errors between two distinct tokens (here, *
and +
), so you don’t
get a specific message, just a general note about ambiguity. This
seems like an area that would be nice to improve.
That last example was a case where the grammar was fundamentally ambiguous. But sometimes there are problems that have to do with how LR(1) parsing works; diagnosing these nicely is even more important, because they are less intuitive to the end user. Also, LALRPOP has several tools that can help make dealing with these problems easier, so where possible we’d really like to suggest these tools to users.
Let’s start with a grammar for parsing Java import declarations. Java’s import declarations have this form:
1 2 |
|
A first attempt at writing a grammar for them might look like this (in
this grammar, I gave all of the nonterminals the type ()
, so there
is no need for action code; this means that this grammar does not
build a parse tree, and so it can only be used to decide if the input
is legal Java or not):
1 2 3 4 5 6 7 8 9 10 |
|
Now, unlike before, this grammar is unambiguous. Nonetheless, if we try to run it through LALRPOP, we will get the following error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
http://smallcultfollowing.com/babysteps/blog/2016/03/02/nice-errors-in-lalrpop/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |