开发者

ANTLR: "missing attribute access on rule scope" problem

开发者 https://www.devze.com 2022-12-10 11:27 出处:网络
I am trying to build an ANTLR grammar that parses tagged sentences such as: DT The NP cat VB ate DT a NP rat

I am trying to build an ANTLR grammar that parses tagged sentences such as:

DT The NP cat VB ate DT a NP rat

and have the grammar:

fragment TOKEN  :   (('A'..'Z') | ('a'..'z'))+;
fragment WS :   (' ' | '\t')+;
WSX :   WS;
DTTOK   :   ('DT' WS TOKEN);
NPTOK   :   ('NP' WS TOKEN);
nounPhrase:  (DTTOK WSX NPTOK);
chunker : nounPhrase {System.out.println("chunk found "+"("+$nounPhrase+")");};

The grammar generator generates the "missing attribute access on rule scope: nounPhrase" in the last line.

[I am still new to ANTLR and although some grammars work it's still trial and error. I also frequently get an "OutOfMemory" error when running g开发者_如何学运维rammars as small as this - any help welcome.]

I am using ANTLRWorks 1.3 to generate the code and am running under Java 1.6.


"missing attribute access" means that you've referenced a scope ($nounPhrase) rather than an attribute of the scope (such as $nounPhrase.text).

In general, a good way to troubleshoot problems with attributes is to look at the generated parser method for the rule in question.

For example, my initial attempt at creating a new rule when I was a little rusty:

multiple_names returns [List<Name> names]
@init {
    names = new ArrayList<Name>(4);
}
 : a=fullname ' AND ' b=fullname { names.add($a.value); names.add($b.value); };

resulted in "unknown attribute for rule fullname". So I tried

multiple_names returns [List<Name> names]
@init {
    names = new ArrayList<Name>(4);
}
 : a=fullname ' AND ' b=fullname { names.add($a); names.add($b); };

which results in "missing attribute access". Looking at the generated parser method made it clear what I needed to do though. While there are some cryptic pieces, the parts relevant to scopes (variables) are easily understood:

public final List<Name> multiple_names() throws RecognitionException {
    List<Name> names = null;        // based on "returns" clause of rule definition
    Name a = null;                  // based on scopes declared in rule definition
    Name b = null;                  // based on scopes declared in rule definition
    names = new ArrayList<Name>(4); // snippet inserted from `@init` block

    try {
        pushFollow(FOLLOW_fullname_in_multiple_names42);
        a=fullname();
        state._fsp--;
        match(input,189,FOLLOW_189_in_multiple_names44); 
        pushFollow(FOLLOW_fullname_in_multiple_names48);
        b=fullname();
        state._fsp--;
        names.add($a); names.add($b);// code inserted from {...} block
    }
    catch (RecognitionException re) {
        reportError(re);
        recover(input,re);
    }
    finally {
        // do for sure before leaving
    }
    return names;                    // based on "returns" clause of rule definition
}

After looking at the generated code, it's easy to see that the fullname rule is returning instances of the Name class, so what I needed in this case was simply:

multiple_names returns [List<Name> names]
@init {
    names = new ArrayList<Name>(4);
}
 : a=fullname ' AND ' b=fullname { names.add(a); names.add(b); };

The version you need in your situation may be different, but you'll generally be able to figure it out pretty easily by looking at the generated code.


In the original grammer, why not include the attribute it is asking for, most likely:

chunker : nounPhrase {System.out.println("chunk found "+"("+$nounPhrase.text+")");};

Each of your rules (chunker being the one I can spot quickly) have attributes (extra information) associated with them. You can find a quick list of the different attributes for the different types of rules at http://www.antlr.org/wiki/display/ANTLR3/Attribute+and+Dynamic+Scopes, would be nice if descriptions were put on the web page for each of those attributes (like for the start and stop attribute for the parser rules refer to tokens from your lexer - which would allow you to get back to your line number and position).

I think your chunker rule should just be changed slightly, instead of $nounPhrase you should use $nounPhrase.text. text is an attribute for your nounPhrase rule.

You might want to do a little other formating as well, usually the parser rules (start with lowercase letter) appear before the lexer rules (start with uppercase letter)

PS. When I type in the box the chunker rule is starting on a new line but in my original answer it didn't start on a new line.


If you accidentally do something silly like $thing.$attribute where you mean $thing.attribute, you will also see the missing attribute access on rule scope error message. (I know this question was answered a long time ago, but this bit of trivia might help someone else who sees the error message!)


Answering question after having found a better way...

WS  :    (' '|'\t')+;
TOKEN   :   (('A'..'Z') | ('a'..'z'))+;
dttok   :   'DT' WS TOKEN;
nntok   :   'NN' WS TOKEN; 
nounPhrase :    (dttok WS nntok);
chunker :  nounPhrase ;

The problem was I was getting muddled between the lexer and the parser (this is apparently very common). The uppercase items are lexical, the lowercase in the parser. This now seems to work. (NB I have changed NP to NN).

0

精彩评论

暂无评论...
验证码 换一张
取 消