i'm implementing an IDE for scheme in eclipse using DLTK. So far, i am programming the grammar to recognize the lexical structure.
i'm following the official EBNF which can be viewed here:
http://rose-r5rs.googlecode.com/hg/doc/r5rs-grammar.htmli can't get a simple form of the numbers grammar getting worked. for example the decimal numbers, i have
grammar r5rsnumbers;
options {
language = Java;
}
program:
NUMBER;
// NUMBERS
NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10
: REAL_10 (
'@' REAL_10
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?
)?
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?;
fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10
: UINTEGER_10 ('/' UINTEGER_10)?
| DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT_10+ '#'* SUFFIX
| DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
| DIGIT_10+ '#'+ '.' '#'* SUFFIX;
fragment PREFIX_10
: RADIX_10 EXACTNESS
| EXACTNESS RADIX_10;
fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' | '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;
the problem is, it is not recognizing anything. i don't understand the warning i get from the PREFIX_10 or how to solve it. if i don't use fragment in the rules, the file isn't compiling since he complains about the DIGIT_10 rule matching the same input as almost all other prior rules.
it's the same with num_2, 开发者_如何转开发num_8 and num_16
plus, i am not sure with my solution of the empty-string.
how do i get around here?
Note that your ANTLR rule:
EMPTY : '""';
does not match an empty string, but two double quotes.
But you don't want a lexer rule to match only an empty string: that will cause it to go in an infinite loop since there are an infinite amount of empty strings in any string/source.
So the BNF rules:
<real 10>
::= <sign> <ureal 10>
<sign>
::= <empty> | {+} | {-}
should not be translated as the following ANTLR rules:
REAL_10
: SIGN UREAL_10
;
SIGN
: EMPTY
| '+'
| '-'
;
but like this instead:
REAL_10
: SIGN? UREAL_10
;
SIGN
: '+'
| '-'
;
Also note that your rule:
fragment COMPLEX_10
: REAL_10 (
'@' REAL_10
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?
)?
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?;
is a bit hard to read. Indenting it differently might make this a bit easier to comprehend:
fragment COMPLEX_10
: REAL_10 ( '@' REAL_10
| '+' (UREAL_10 'i' | 'i')?
| '-' (UREAL_10 'i' | 'i')?
)?
| '+' (UREAL_10 'i' | 'i')?
| '-' (UREAL_10 'i' | 'i')?
;
which could be simplified by writing:
fragment COMPLEX_10
: REAL_10 ('@' REAL_10)?
| REAL_10? ('+' | '-') UREAL_10? 'i'
;
Also be aware that many BNF notations make no distinction between lower- and uppercase literals. So instead of writing 'i'
in your ANTLR grammar, you might want to use ('i' | 'I')
instead.
EDIT
Sebastian wrote:
but i'm still having problems with the
PREFIX_10
rule:fragment PREFIX_10 : RADIX_10? EXACTNESS? | EXACTNESS? RADIX_10?;
which tells me that alternative 2 can never be matched, although it should match#i #d
and#d #i
with the 2 alternatives seperately or am i doing something wrong here?
There are a couple of things wrong with the (fragment) rule PREFIX_10
:
fragment PREFIX_10
: RADIX_10? EXACTNESS? // alternative 1
| EXACTNESS? RADIX_10? // alternative 2
;
For one, both match an empty string. Because alternative 1 will always match an empty string, alternative 2 would never match, which is what ANTLR was telling you.
Now, looking at the BNF rules:
<exactness>
::= <empty> | {#i} | {#e}
<prefix 10>
::= <radix 10> <exactness>
| <exactness> <radix 10>
<radix 10>
::= <empty> {#d}
(Note that <empty> {#d}
equals {#d}
, so the <empty>
is IMO just misplaced. All other radii don't have and <empty>
part)
I'd translate those into the following (untested!) ANTLR rules:
fragment EXACTNESS
: '#i'
| '#e'
;
fragment PREFIX_10
: RADIX_10 EXACTNESS?
| EXACTNESS RADIX_10 // **
;
fragment RADIX_10
: '#d'
;
** Note that it's not:
fragment PREFIX_10
: RADIX_10 EXACTNESS? // matches '#d'
| EXACTNESS? RADIX_10 // matches '#d'
;
because the lexer does not know through which alternative to match #d
.
And in case the BNF rule for <radix 10>
should be like this (ie. they forgot to place a |
):
<radix 10>
::= <empty>
| {#d}
then the ANTLR PREFIX_10
should still look like:
fragment PREFIX_10
: RADIX_10 EXACTNESS?
| EXACTNESS RADIX_10
;
but then all other rules that use PREFIX_10
should make PREFIX_10
optional.
HTH
精彩评论