Antlr grammar for a game_问答_开发者_运维开发者技术经验分享

I'm trying to pre-process some dialog files from a old game -- Vampire the Masquerade: Bloodlines if you're curious -- to insert some code on a specific place on some data files.

I want to use Antlr to transform the dialog files, but my grammar is ambiguous even though the format is very simple.

The format allows for dialog of the NPC's and the PC as a series of lines:

 { TEXT } repeated (it varies, normally 13 but sometimes less)

One of these tokens in particular (5th, but 1st in the example) is important, because it defines if the line belongs to a NPC or the PC. I has the '#' char on it. However, other tokens can have the same character, and I'm getting warnings on some valid files that i would like to eliminate.

My big problem ATM is a grammar ambiguity. To resolve the problem of the number of Tokens being variable, I resolved to use '*' to globble the ones I don't care about until the newline.

So I did this:

any* NL*

Expecting this to match the rest of the tokens before any set of newlines. However, Antlr is saying the grammar is ambigous while:

any NL* or any* NL is not.

EDIT: deleted old grammar, check down for the new one, and new problem.

EDIT: I resolved the ambiguity and thanks to Mr Kiers i am almost sure that my new grammar will match the input, however i have now a new problem: "error(208): VampireDialog.g:99:1: The following token definitions can never be matched because prior tokens match the same input: NOT_SHARP" If i remove the NL input that he is complaining about, then it is the NL Lexer rule that complains instead...

As mr Kiers told me to post a example of the input her开发者_运维技巧e it is too: npc line, note the #

{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }

pc line, note the absence of #

{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

Here is the grammar:

grammar VampireDialog;

options
{
output=AST;
ASTLabelType=CommonTree;
language=Java;
} 
tokens
{
REWRITE;
}

@parser::header {
import java.util.LinkedList;
import java.io.File;
}

@members {
    public static void main(String[] args) throws Exception {
        File vampireDir = new File(System.getProperty("user.home"), "Desktop/Vampire the Masquerade - Bloodlines/Vampire the Masquerade - Bloodlines/Vampire/dlg");
        List<File> files = new LinkedList<File>();
        getFiles(256, new File[]{vampireDir}, files, new LinkedList<File>());
        for (File f : files) {
            if (f.getName().endsWith(".dlg")) {
                VampireDialogLexer lex = new VampireDialogLexer(new ANTLRFileStream(f.getAbsolutePath(), "Windows-1252"));
                TokenRewriteStream tokens = new TokenRewriteStream(lex);
                VampireDialogParser parser = new VampireDialogParser(tokens);
                Tree t = (Tree) parser.dialog().getTree();
                //  System.out.println(t.toStringTree());
            }
        }
    }

    public static void getFiles(int levels, File[] search, List<File> files, List<File> directories) {
        for (File f : search) {
            if (!f.exists()) {
                throw new AssertionError("Search file array has non-existing files");
            }
        }
        getFilesAux(levels, search, files, directories);
    }

    private static void getFilesAux(int levels, File[] startFiles, List<File> files, List<File> directories) {
        List<File[]> subFilesList = new ArrayList<File[]>(50);
        for (File f : startFiles) {
            File[] subFiles = f.listFiles();
            if (subFiles == null) {
                files.add(f);
            } else {
                directories.add(f);
                subFilesList.add(subFiles);
            }
        }

        if (levels > 0) {
            for (File[] subFiles : subFilesList) {
                getFilesAux(levels - 1, subFiles, files, directories);
            }
        }
    }
}




/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
dialog : (ANY ANY ANY  (npc_line | player_line) ANY* NL*)*;
npc_line :  npc_marker npc_conditional;
player_line : pc_marker conditional;
npc_conditional : '{' condiction '}'
            {   String cond = $condiction.tree.toStringTree(), partial = "npc.Reset()", full = "("+cond+") and npc.Reset()";
                boolean empty = cond.trim().isEmpty(); 
                boolean alreadyProcessed = cond.endsWith("npc.Reset()");}   
                ->   {empty}? '{' REWRITE[partial] '}'
                ->   {alreadyProcessed}? '{' REWRITE[cond] '}'
                ->   '{' REWRITE[full] '}';
conditional : '{' condiction '}'
            {   String cond = $condiction.tree.toStringTree(), full = "("+cond+") and npc.Count()";
                boolean empty = cond.trim().isEmpty(); 
                boolean alreadyProcessed = cond.endsWith("npc.Count()");}   
                ->   {empty}? '{' REWRITE[cond] '}'
                ->   {alreadyProcessed}? '{' REWRITE[cond] '}'
                ->   '{' REWRITE[full] '}';
condiction : TEXT*;
//in the parser ~('#') means: "match any token except the token that matches '#'" 
//and in lexer rules ~('#') means: "match any character except '#'"
pc_marker : '{' NOT_SHARP* '}';
npc_marker : '{' NOT_SHARP* '#' NOT_SHARP* '}';


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
ANY : '{' TEXT* '}';
TEXT : ~(NL|'}');
NOT_SHARP : ~(NL|'#'|'}');
NL : ( '\r' | '\n'| '\u000C');

I propose a slightly different approach. You could use something called a syntactic predicate. This looks like (some_parser_or_lexer_rules_here)=> parser_or_lexer_rules. A small example:

line
  :  (A B)=> A B
  |          A C
  ;

what happens in the rule line is this: first a look-ahead is performed to see if the next tokens in the stream are A and B. If this is the case, these tokens are matched, if not, A and C are matched.

You can apply the same in your case by first looking ahead in the stream if there is a # before the end of the line, and if so, match a npc line, and if not, match a pc line.

A demo grammar:

grammar VampireDialog;

parse
  :  LineBreak* line (LineBreak+ line)* LineBreak* EOF
  ;

line
  :  (any_except_line_breaks_and_hash+ Hash)=> conditionals {System.out.println("> npc :: " + $conditionals.text);}
  |                                            conditionals {System.out.println("> pc  :: " + $conditionals.text);}
  ;

conditionals  
  :  Space* conditional (Space* conditional)* Space*
  ;

conditional
  :  Open conditional_text Close
  ;

conditional_text
  :  (Hash | Space | Other)*
  ;

any_except_line_breaks_and_hash
  :  (Space | Open | Close | Other)
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

Space
  :  ' ' | '\t'
  ;

Hash  : '#';
Open  : '{';
Close : '}';

// Fall through rule: if the lexer does not match anything 
// above this rule, this `Any` rule will match.
Other
  :  .
  ;

And a little class to test the grammar:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = 
                "{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }\n" + 
                "\n" +
                "{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }\n";
        ANTLRStringStream in = new ANTLRStringStream(source);
        VampireDialogLexer lexer = new VampireDialogLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        VampireDialogParser parser = new VampireDialogParser(tokens);
        parser.parse();
    }
}

which prints:

> npc :: { 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
> pc  :: { 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

As you can see, it skips empty lines as well.

(Note that syntactic- or semantic predicates do not work with ANTLRWorks, so you need to test it on the command line!)