开发者

What is the equivalent for epsilon in ANTLR BNF grammar notation?

开发者 https://www.devze.com 2023-02-21 06:27 出处:网络
During taking advantage of ANTLR 3.3, I\'m changing the current grammar to support inputs without parenthesis too. Here\'s the first version of my grammar :

During taking advantage of ANTLR 3.3, I'm changing the current grammar to support inputs without parenthesis too. Here's the first version of my grammar :

grammar PropLogic;

        NOT : '!' ;
        OR  : '+' ;
        AND : '.' ;
        IMPLIES : '->' ;
        SYMBOLS : ('a'..'z') | '~' ;
        OP : '(' ;
        CP : ')' ;

    prog    : formula EOF ;


    formula : NOT formula
        | OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
        | SYMBOLS ;


    WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

Then I changed it this way to support the appropriate features :

grammar PropLogic;

    NOT : '!' ;
    OR  : '+' ;
    AND : '.' ;
    IMPLIES : '->' ;
    SYMBOL : ('a'..'z') | '~' 开发者_开发百科;
    OP : '(' ;
    CP : ')' ;
    EM : '' ;

prog    : formula EOF ;


formula : OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
    | ( NOT formula | SYMBOL )( AND formula | OR formula | IMPLIES formula | EM ) ;


WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

But I've been faced with following error :

error<100>:  syntax error: invalid char literal: ''
error<100>:  syntax error: invalid char literal: ''

Does anybody know that how can I overcome this error?


Your EM token:

EM : '' ;

is invalid: you can't match an empty string in lexer rules.

To match epsilon (nothing), you should do:

rule 
  :  A 
  |  B 
  |  /* epsilon */ 
  ;

Of course, the comment /* epsilon */ can safely be removed.

Note that when you do it like that in your current grammar, ANTLR will complain that there can be rules matched using multiple alternatives. This is because your grammar is ambiguous.


I'm not an ANTLR expert, but you might try:

formula : term ((AND | OR | IMPLIES ) term )*;
term :  OP formula CP | NOT term | SYMBOL ;

If you want traditional precedence of operators this won't do the trick, but that's another issue.

EDIT: OP raised the ante; he wants precedence too. I'll meet him halfway, since it wasn't part of the orginal question. I've added precedence to the grammar that makes IMPLIES the lower precedence than other operators, and leave it to OP to figure out how to do the rest.

 formula:  disjunction ( IMPLIES disjunction )* ;
 disjunction:  term (( AND | OR ) term )* ;
 term:  OP formula CP | NOT term | SYMBOL ;

OP additionally asked, "how to convert (!p or q ) into p -> q". I think he should have asked this as a separate question. However, I'm already here. What he needs to do is walk the tree, looking for the pattern he doesn't like, and change the tree into one he does, and then prettyprint the answer. It is possible to do all this with ANTLR, which is part of the reason it is popular.

As a practical matter, procedurally walking the tree and checking the node types, and splicing out old nodes and splicing in new is doable, but a royal PitA. Especially if you want to do this for lots of transformations.

A more effective way to do this is to use a program transformation system, which allows surface syntax patterns to be expressed for matching and replacement. Program transformation systems of course include parsing machinery and more powerful ones let you (and indeed insist) that you define a grammar up front much as you for ANTLR.

Our DMS Software Reengineering Toolkit is such a program transformation tool, and with a suitably defined grammar for propositions, the following DMS transformation rule would carry out OP's additional request:

domain proplogic; // tell DMS to use OP's definition of logic as a grammar

rule normalize_implies_from_or( p: term, q: term): formula -> formula
  " NOT \p OR \q " -> " \p IMPLIES \q ";

The " ... " is "domain notation", e.g, surface syntax from the proplogic domain, the "\" are meta-escapes, so "\p" and "\q" represent any arbitrary term from the proplogic grammar. Notice the rule has to reach "across" precedence levels when being applied, as "NOT \p OR \q" isn't a formula and "\p IMPLIES \q" is; DMS takes care of all this (the "formula -> formula" notation is how DMS knows what to do). This rule does a tree-to-tree rewrite. The resulting tree can be prettyprinted by DMS.

You can see a complete example of something very similar, e.g., a grammar for conventional algebra and rewrite rule to simplify algebraic equations.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号