开发者

ANTLR treats part of string as a keyword

开发者 https://www.devze.com 2023-01-27 05:44 出处:网络
I\'m currently learning ANTLR for myself. First of I开发者_开发知识库 decided to write the simplest grammar. There is plain text file with directives:

I'm currently learning ANTLR for myself. First of I开发者_开发知识库 decided to write the simplest grammar. There is plain text file with directives:

pid = something.pid
log = something.log

The grammar I wrote is:

grammar TestGrammar;

options {
  language = Java;
}

@header {
  package test.antlr;
}

@lexer::header {
  package test.antlr;
}

program
  : directive+
  ;

directive
  : pid
  | log
  ;

pid
  : PID EQ (WORD|POINT)+
  ;

log
  : LOG EQ (WORD|POINT)+
  ;

WS: ( ' '
    | '\t'
    | '\r'
    | '\n'
    ) {$channel=HIDDEN;}
    ;

PID
  : 'pid'
  ;

LOG
  : 'log'
  ;

EQ
  : '='
  ;

POINT
  : '.'
  ;

WORD
  : ('a'..'z'|'A'..'Z'|'_')+
  ;

I feel I made a mistake somewhere and ANTLR proves that throwing MismatchedTokenException. It treats something.pid as a directive and throws an exception.

However I don't understand what am I doing wrong. Any help will be appreciated.

Thanks.


The lexer is a very simple object: without interference from the parser, it tokenizes the input source. So, the input:

pid = something.pid

is not tokenized as:

PID EQ WORD POINT WORD

but as:

PID EQ WORD POINT PID

That's why your rule:

pid
  : PID EQ (WORD|POINT)+
  ;

matches "pid = something." and leaves the second "pid" in the token-stream, expecting an EQ atfer it (hence the exception).

A possible fix would be to do something like this:

pid
  : PID EQ (word|POINT)+
  ;

log
  : LOG EQ (word|POINT)+
  ;

word
  : WORD
  | PID
  | LOG 
  ;

Or by doing something like:

pid
  : PID EQ FULL_WORD
  ;

log
  : LOG EQ FULL_WORD
  ;

// ...

FULL_WORD
  : WORD (POINT WORD)*
  ;

// ...
0

精彩评论

暂无评论...
验证码 换一张
取 消