Better way to map tokens to enum values?_问答_开发者

I'm trying to have my parser rule select an enum value based on my DIR token. Is there a way I can do this without creating separate, full-fledged tokens for each direction? Or generally a cleaner approach?开发者_高级运维

DIR : (NORTH|SOUTH) (EAST|WEST)?
 | EAST
 | WEST;

fragment NORTH: N '.'? | N O R T H;
fragment SOUTH: S '.'? | S O U T H;
fragment EAST : E '.'? | E A S T;
fragment WEST : W '.'? | W E S T;

(there are token fragments for each letter to facilitate case-insensitivity)

The enum is public enum Direction { NORTH, SOUTH, EAST, WEST, NORTHEAST, NORTHWEST, SOUTHEAST, SOUTHWEST }

Right now the only solution I see is to convert DIR to a parser rule and make the directions separate tokens:

NORTH: N '.'? | N O R T H;
SOUTH: S '.'? | S O U T H;

dir returns [Direction dir]
 : NORTH { dir = Direction.NORTH; }
 | SOUTH { dir = Direction.SOUTH; }

This isn't terrible for this scenario, but I've got some other enums that will have lots more options so I'm looking for any ways to simplify this.

I'm not very familiar with ANTLR, but from a fast scan of the docs it seems to work pretty much like yacc/racc and it seems to allow arbitrary methods to be defined in an @member block, so I would expect you can use something like:

dir returns [Direction dir]
: DIR { $result = directionStringToEnum($DIR.text); }

where you have to define a separate

public Direction directionStringToEnum(String dir) {
   Direction.valueOf(dir.toUpperCase());
}

in the @member block. You may be able to generalize that to handle arbitrary enums (but probable in any ugly way, requiring Class.forName()).

Another option is to rewrite the inner text of the tokens so they match your enum values. In your parser, you could then do Direction.valueOf(String) to parse it into a real enum.

Something like this:

...

parse
  :  (
       DIR {System.out.println("enum=" + Direction.valueOf($DIR.text));}
     )* 
     EOF
  ;

DIR
  :  ( NORTH {setText("NORTH");}      | SOUTH {setText("SOUTH");}      ) 
     ( EAST  {setText($text+"EAST");} | WEST  {setText($text+"WEST");} )?
  |  EAST {setText("EAST");}
  |  WEST {setText("WEST");}     
  ;

...

The following test:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "N EaSt S. w NE N.w. Southe SWeSt";
    CompassLexer lexer = new CompassLexer(new ANTLRStringStream(src));
    CompassParser parser = new CompassParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

produced:

java -cp antlr-3.3.jar org.antlr.Tool Compass.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

enum=NORTH
enum=EAST
enum=SOUTH
enum=WEST
enum=NORTHEAST
enum=NORTHWEST
enum=SOUTHEAST
enum=SOUTHWEST

It's a bit clunky, perhaps. But if you're going to construct tokens from (many) different tokens (like with South-West or North-East), it may shorten your grammar opposed to something like:

dir returns [Direction dir]
 : NORTH { dir = Direction.NORTH; }
 | SOUTH { dir = Direction.SOUTH; }
 ...
 ;

Expanding on the idea in Confusion's comment, I did track down a way to get the token names. So if I make a token for each direction I should be able to do something like:

dir returns [Direction dir]
 : (d=NORTH | d=SOUTH | d=EAST | d=WEST | d=NORTHEAST | d=NORTHWEST | d=SOUTHEAST | d=SOUTHWEST )
   { dir = Direction.valueOf(getTokenNames()[$d.getType()]); }

NORTH: N '.'? | N O R T H;
SOUTH: S '.'? | S O U T H;
EAST:  E '.'? | E A S T;
WEST:  W '.'? | W E S T;
NORTHEAST : N E | N '.' E '.' | N O R T H E A S T;
NORTHWEST : N W | N '.' W '.' | N O R T H W E S T;
SOUTHEAST : S E | S '.' E '.' | S O U T H E A S T;
SOUTHWEST : S W | S '.' W '.' | S O U T H W E S T;

This will mean a lot more tokens, but really cuts down on the typing.

I also tried to combine it this with Bart's suggestion, but it appears that state.type isn't set during the lexing phase (it results in NullPointerException). The lexer does assign type IDs to fragments, there just doesn't seem to be any way to access them from a lexer rule.

main_rule[CustomObject object]: d=DIR ...
           { object.setDirection(Direction.valueof($d.text)); };

DIR
 : (NORTH | SOUTH | EAST| WEST | NORTHEAST | NORTHWEST | SOUTHEAST | SOUTHWEST)
   { setText(getTokenNames()[state.type]);

fragment NORTH: N '.'? | N O R T H;
...