I'm using GNU Bison 2.4.2 to write a grammar for a new language I'm working on and I have a question. When I specify a rule, let's say:
statement : T_CLASS T_IDENT '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
If I have a variation on the rule, for instance
statement : T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
Where 开发者_如何学JAVA(from flex scanner rules) :
"class" return T_CLASS;
"extends" return T_EXTENDS;
[a-zA-Z\_][a-zA-Z0-9\_]* return T_IDENT;
(and T_IDENT_LIST is a rule for comma separated identifiers).
Is there any way to specify all of this only in one rule, setting somehow the "T_EXTENDS T_IDENT_LIST" as optional? I've already tried with
T_CLASS T_IDENT (T_EXTENDS T_IDENT_LIST)? '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
But Bison gave me an error.
Thanks
To make a long story short, no. Bison only deals with LALR(1) grammars, which means it only uses one symbol of lookahead. What you need is something like this:
statement: T_CLASS T_IDENT extension_list '{' ...
extension_list:
| T_EXTENDS T_IDENT_LIST
;
There are other parser generators that work with more general grammars though. If memory serves, some of them support optional elements relatively directly like you're asking for.
Why don't you just split them using the choice (|
) operator?
statement:
T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST '{' T_CLASS_MEMBERS '}'
| T_CLASS T_IDENT '{' T_CLASS_MEMBERS '}'
I don't think you can do it just because this is a LALR(1) bottom-up parser, you would need something different like a LL(k) (ANTLR?) to do what you want to do..
I think the most you can do is
statement : T_CLASS T_IDENT '{' T_CLASS_MEMBERS '}'
| T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST '{' T_CLASS_MEMBERS '}' {
}
精彩评论