开发者

Lemon power or not?

开发者 https://www.devze.com 2023-01-31 21:50 出处:网络
For grammar parser, I used to \"play\" with Bison which have its pros/cons. Last week, I noticed on SqLite site that开发者_开发百科 the engine is done with another grammar parser:Lemon

For grammar parser, I used to "play" with Bison which have its pros/cons.

Last week, I noticed on SqLite site that开发者_开发百科 the engine is done with another grammar parser: Lemon

Sounds great after reading the thin documentation.

Do you have some feedback about this parser?

Cannot really see pertinent information on Google and Wikipedia (just a few examples, same tutorials) It doesn't seem very popular. (there is no lemon tag in Stack Overflow [ed: there is now :P])


Reasons we are using Lemon in our firmware project are:

  • Small size of generated code and memory footprint. It produces the smallest parser I found (I compared parsers of similar complexity generated by flex, bison, ANTLR, and Lemon);
  • Excellent support of embedded systems: Lemon doesn't depend on standard library, you can specify external memory management functions, debug logging is removable.
  • Public domain license. There is separate fork of Lemon licensed under GPLv2 that is not suitable for our needs because of viral license. So we get latest sqlite sources and compile Lemon out of them (it consists of only two files);
  • Pull-parsing. It makes code more straightforward to understand and maintain than Flex/Bison parsing code. Thread-safety as an additional bonus I admire.
  • Simple integration with tokenizers. Our project nature requires tokenizing of binary stream with variable tokens size. It was quite an easy to implemented tokenizer and integrate with parser API of only 3 functions and one feedback context variable. We investigated ways of integrating Lemon with re2c and Ragel and found them also quite easy to implement.
  • Very simple syntax fast to learn.
  • Lemon explicitly separate development of tokenizer and lexical analyzer(parser). My development flow starts with designing of parser grammar. I'm able to check complex rules with implicit token sequence by the means of several Parser(...) calls at this first stage. Tokenizer is implemented afterwards.

Surely Lemon is not a silver bullet, it has limited area of application. Among disadvantages:

  • Lemon requires to write more rules in comparison with Bison because of simplified syntax: no repetitions and optionals, one action per rule, etc.
  • Complete set of LALR(1) parser limitations.
  • Only the C language.

Weigh the pros and cons before making your choice. I've done mine ;-)


Interesting find! I haven't actually used it, so the commentary is based on reading the documentation.

The redesign so that the lexical analysis is done separately from the parsing immediately seems to have merit. In particular, it has the potential to simplify operations such as handling multiple or nested source files. The Lex-based yywrap() mechanism is less than ideal. That it avoids all global variables and has careful memory allocation and deallocation control should count in its favour (that it allows the choice of allocator and deallocator greatly helps too - at least for the environments where I work, where memory allocation is always an issue).

The rethinking on how the rules are organized and how the terminals are identified is a good idea.

All in all, it looks like a well thought out redesign of Bison.

It is in the public domain according to the referenced web pages.

0

精彩评论

暂无评论...
验证码 换一张
取 消