开发者

Getting Started with ANTLR

开发者 https://www.devze.com 2023-02-13 04:14 出处:网络
a few days ago I posted this question on the ANTLR mailinglist, but didn\'t recieve any support. So I\'m hoping you guys here can help me out:

a few days ago I posted this question on the ANTLR mailinglist, but didn't recieve any support. So I'm hoping you guys here can help me out:

I am currently trying to dig into Antlr as I find this tool very helpful. The last Time I used it, I generated something based upon a finished grammar. This time I wanted to build my own grammar and really start understanding what's happening.

For this I decided to build a parser for some Wiki-Notation-Like text.

Here an example (without the -Start - and - End - row):

------------ Start ---------------
before
More before

And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..

And even more.
------------ End ---------------

If the text contains a "Lineup" block, then this should be parsed. The content is at least one "Floor" followed by a number of Names, a new "Floor" or the closing "Lineup" I managed my parser to parse the text if I change my grammar and the text I am trying to parse to "[Floor:]" (One Block) but I really need a name in there :(

As soon as I change my Grammar to support the Floor-Name, nothing works anymore. Could you please help me with this? I'm not looking for someone that fixes it for me without a comment. I would really like to know why my grammar doesn't work. I'm really stuck and I'm working on this for days now (Ok ... I admit, it's just my spare time after work ... but at least all of that).

Here comes my gammar. If I try to parse the full text, I allways get EarlyExitExceptions while parsing the :( :

grammar CalendarEventsJava;

/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/

event    : (
                               (LINE_CONTENT | NEWLINE)*
                               (lineup (LINE_CONTENT | NEWLINE)*)?
               );

lineup   : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);

floor      : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);

lineupEntry
                : (LINE_CONTENT? NEWLINE);

artist     : LINE_CONTENT;


/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/




LINEUP_OPEN
                :              '[Lineup]';
LINEUP_CLOSE
                :              '[/Lineup]';
FLOOR_OPEN
                :              '[Floor:';
FLOOR_CLOSE
                :              ']';

BLANKS               :              ( ' ' | '\t' )+;
NONBREAKING
                :              ~('\r' | '\n' | ']');
NEWLINE            :              '\r'? '\n';


// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
                :     开发者_如何学运维         (NONBREAKING | ']')+ ;

I really hope you can help me, as I'm really anxious to really get started with ANTLR, cause I think it really rocks :)

Chris



The problem

If you examine the token stream after tokenizing your source, you'll see that the following tokens are fed to the parser:

LINEUP_OPEN  :: [Lineup]
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Main Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test1
NEWLINE      :: \n
LINE_CONTENT :: Test2
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Classics Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test3
NEWLINE      :: \n
LINE_CONTENT :: Test4
NEWLINE      :: \n
LINE_CONTENT :: Test5
NEWLINE      :: \n
LINE_CONTENT :: Test6
NEWLINE      :: \n
LINEUP_CLOSE :: [/Lineup]

As you can see, there is never a FLOOR_OPEN created but LINE_CONTENT tokens instead.

Here's how you can manually debug your token stream:

String source = 
        "[Lineup]\n" +
        "[Floor:Main Floor]\n" +
        "Test1\n" +
        "Test2\n" +
        "[Floor:Classics Floor]\n" +
        "Test3\n" +
        "Test4\n" +
        "Test5\n" +
        "Test6\n" +
        "[/Lineup]";
ANTLRStringStream in = new ANTLRStringStream(source);
CalendarEventsJavaLexer lexer = new CalendarEventsJavaLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CalendarEventsJavaParser parser = new CalendarEventsJavaParser(tokens);
for(Object o : tokens.getTokens()) {
    CommonToken t = (CommonToken)o;
    System.out.println(parser.tokenNames[t.getType()] + " :: " + t.getText().replace("\n", "\\n"));
}


The solution

Changing:

FLOOR_OPEN
                :              '[Floor:';

to

FLOOR_OPEN   : '[Floor:' ~']'* ']';

(FLOOR_CLOSE can then be removed)

and changing:

NONBREAKING
            :              ~('\r' | '\n');

to:

NONBREAKING  : ~('\r' | '\n' | '[' | ']');

will result in the following parse tree:

Getting Started with ANTLR


Comments

Note that the lexer rules NONBREAKING and LINE_CONTENT are very similar, you probably don't want NONBREAKING to ever appear in the token stream. It's be better if you make NONBREAKING a fragment-rule. Fragment rules are only used by other lexer rules and will therefor never be used to create a "real" token:

fragment NONBREAKING  : ~('\r' | '\n' | '[' | ']');

LINE_CONTENT : NONBREAKING+;


It looks like

NONBREAKING
                :              ~('\r' | '\n');

is consuming the floor close. It will consume all characters up to the end of the line. Try excluding the floor close character from it.

Kate.

0

精彩评论

暂无评论...
验证码 换一张
取 消