Antlr (lexer): matching the right token_问答_开发者

开发者 https://www.devze.com 2023-01-19 08:58 出处：网络

In my Antlr3 grammar, I have several \"overlapping\" lexer rules, like this: NAT: (\'0\' .. \'9\')+ ; INT: (\'+\' | \'-\')? (\'0\' .. \'9\')+ ;

In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:

NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;

Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:

s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;

The input {17} should then match {, INT, and }, but the lexer has already decided that 17 is a NA开发者_运维百科T-token. How can I prevent this behavior? The backtrack option is already set to true, but it only seems to affect parser rules.

There might be a complex way to make the lexer context-sensitive, but in general that's what you want the parser to take care of, and you want your lexer to just provide a stream of tokens. My recommendation is to refactor your lexer to return DIGITS and SIGN and let your parser work out what kind of number the digits represent by the context.