开发者

What advantages do LL parsers have over LR parsers?

开发者 https://www.devze.com 2023-01-23 04:19 出处:网络
What advantages do LL parsers have over LR parsers to warrant their relative popularity in today\'s parser generator tools?

What advantages do LL parsers have over LR parsers to warrant their relative popularity in today's parser generator tools?

According to Wikipedia, LR parsing appears to have advantages over LL:

LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in c开发者_如何学Pythonontrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.

Note: This is not homework. I was just surprised when I found out that Antlr is an LL parser generator (despite having "LR" in its name!).


GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).

Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or IF-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.

GLR is O(n^3) worst case. packrat/PEG is O(n) worst case. ANTLR's are O(n^2) due to cyclic lookahead DFA but O(n) in practice. Doesn't matter really. GLR is fast enough.

ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)

Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.


If you have to hand code one, recursive descent (LL) is something you can do realistically; people cannot hand-build L(AL)R parsers practically by hand.

Given that modern parser generators will handle all the parser construction for you, and that space is not much of an issue, I prefer LR parsers because you don't have to fight with the grammars as much to make them valid for your particular parser generator (no "remove all the left recursion" silliness).

In fact, I prefer GLR parsers, which will pretty much parse anything with a context free grammar. No left-recursion worries. No shift/reduce conflict worries. No lookahead limits.

If you want to see the range of languages that one GLR parsing engine can handle (including the famously hard-to-parse-using-LL/LALR language, C++), you can look here.


From my personal experience (I used both for various situations), the most practical difference is that, with a LL(k), you can define the grammar in an easier way (since it's top-down) without caring about many possible reduce-reduce or shift-reduce conflicts which often occur with LR parsers. The only thing you have to care about is left-recursion which must be transformed into right one.

Another thing is that the top-down approach usually implies a higher complexity (regarding either space or time), because it has to store the whole tree while parsing and it can grow a lot until ambiguities are solved.


The only advantage I've ever been familiar with is that you can easily code LL parsers by hand. LR parsers are MUCH harder to code by hand (you usually use a parser generator).


LL Parsings worst case complexity is O(n^4), whereas LR parsing's worst case complexity is better, O(n^3).

(But no one would ever write O(n^4) grammar.)

https://en.wikipedia.org/wiki/Top-down_parsing


According to Laurence Tratt, LL parsers have a small but important niche, which is if you need:

the highest possible performance and/or the best possible error messages.

And hand code a recursive descent parser to accomplish that.

For a realistic programming language grammar, it will typically take many person months of effort to beat an automatically generated parser.

However:

LL parsers are largely unappealing, because the lack of left recursion makes expressing many standard programming language constructs awkward.

and thus his conclusion is that LR parsing, which handles left recursion just fine, is the way to go.

For a more thorough review of this I recommend Laurence Tratt's excellent essay Which Parsing Approach?


One reason that comes to mind is, it's much easier to do a language that needs arbitrary backtracking (cough C++) in an LL paradigm.

0

精彩评论

暂无评论...
验证码 换一张
取 消