consider following extract of my grammar:
definition
: '(' 'define'
( '(' variable def_formals ')' body ')'
| variable expression ')'
)
;
def_formals
: variable* ('.' variable)?
;
body
: ((definition)=> definition)* expression+
;
variables are Identifiers, expressions are some expressions of scheme (like literals or lambda expressions). the full grammar can be found in some of my other questions.
so i was testing the whole thing and came up with an issue regarding the NoViableException.
so far, everything what should run fine runs fine. for example
(define x 5)
is recognized.
now i was testing what the parser should NOT recognize.
for example
(define x 5))
reports about the extra ")" at the end of the line.
but when i leave stuff out, for example
(define x)
or
(define)
the parser doesn't complain at all. when i check the interpreter, the NoViableAltException shows up correctly. but i can't figure out how to get this error to show up in an external programm (like a java test class)
i tried to make the parser break up on the first syntax error he sees like described in the book from Terrence Parr (page 252), but that didn't help either. i also tried something like
private List<String> errors = new LinkedList<String>();
public void displayRecognitionError(String[] tokenNames,
RecognitionException e) {
String hdr = getErrorHeader(e);
String msg = getErrorMessage(e, tokenNames);
errors.add(hdr + " " + msg);
}
public List<String> getErrors() {
return errors;
}
but that method doesn't return anything when called.
so how do i get ANTLR to show me this errors when they are clearly being thrown internally?
edit: this is the whole grammar:
grammar R5RS;
options {
language = Java;
output=AST;
}
@header{
package r5rsgrammar;
import r5rsgrammar.scope.*;
import java.util.LinkedList;
}
@lexer::header{
package r5rsgrammar;
import r5rsgrammar.scope.*;
import java.util.LinkedList;
}
@members{
// variables wich is used to distinguish between top level and inner definitions
private boolean topLevel;
// the toplevel scope of a file, whose parent is null
private IScope scope;
@Override
public void emitErrorMessage(String message) {
throw new RuntimeException(message);
}
}
// PROGRAMS AND DEFINITIONS
parse
@init{
this.topLevel = true;
this.scope = new Scope();
}
: command_or_definition* EOF
;
command_or_definition
: (syntax_definition)=> syntax_definition
| (definition)=> definition
| ('('BEGIN command_or_definition)=>
'('BEGIN
{ this.topLevel = false;
this.scope = this.scope.push();
}
command_or_definition+
{ this.scope = this.scope.pop();
this.topLevel = true;
}')'
| command
;
command
: expression
;
definition
: '(' DEFINE ( '(' var=variable
{ this.topLevel = false;
this.scope.bind($var.text);
this.scope = this.scope.push();
}
def_formals ')' body
{ this.topLevel = true;
this.scope = this.scope.pop();
}')'
| var=variable
{ this.topLevel = false;
this.scope.bind($var.text);
this.scope = this.scope.push();
}
expression
{ this.topLevel = true;
this.scope = this.scope.pop();
}')'
)
| '(' BEGIN
{this.scope = this.scope.push();}
definition*
{this.scope = this.scope.pop();}')'
;
def_formals
: vars+=variable* ('.' vars+=variable)?
{for (int i = 0; i \less $vars.size(); i++){
String name = ((CommonTree)$vars.get(i)).getText();
this.scope.bind(name);
}
}
;
syntax_definition
: '(' DEFINE_SYNTAX var=variable
{ this.scope.bind($var.text);
this.scope = this.scope.push();}
transformer_spec
{this.scope = this.scope.pop();}')'
;
// EXPRESSIONS
expression
: (variable)=> var=variable
{
if(!this.scope.isBound($var.text))
System.err.println($var.text + " not bound");
}
| (literal)=> literal
| (lambda_expression)=> lambda_expression
| (conditional)=> conditional
| (assignment)=> assignment
| (derived_expression)=> derived_expression
| (procedure_call)=> procedure_call
| (macro_use)=> macro_use
| macro_block
;
keyword
: identifier
;
literal
: quotation
| self_evaluating
;
self_evaluating
: bool
| number
| CHARACTER
| STRING
;
quotation
: '\'' datum
| '(' QUOTE datum ')'
;
lambda_expression
: '(' LAMBDA {this.scope = this.scope.push();}
formals body
{this.scope = this.scope.pop();}')'
;
formals
: '(' (vars+=variable+ ('.' vars+=variable )?)? ')'
{for (int i = 0; i \less $vars.size(); i++){
String name = ((CommonTree)$vars.get(i)).getText();
this.scope.bind(name);
}
}
| var=variable
{this.scope.bind($var.text);}
;
body
: ((definition)=> definition)* sequence
;
sequence
: expression+
;
conditional
: '(' IF test consequent alternate? ')'
;
test
: expression
;
consequent
: expression
;
alternate
: expression
;
assignment
: '(' SET_BANG variable expression ')'
;
derived_expression
: quasiquotation
| '(' ( COND ( '(' ELSE sequence ')'
| cond_clause+ ('(' ELSE sequence ')')?
)
| CASE expression ( case_clause+ ('(' ELSE sequence ')')?
| '(' ELSE sequence ')'
)
| AND test*
| OR test*
| LET variable? '(' {this.scope = this.scope.push();}
binding_spec[false] ')' body
{this.scope = this.scope.pop();}
| LET_STAR '(' {this.scope = this.scope.push();}
binding_spec[true] ')' body
{this.scope = this.scope.pop();}
| LETREC '(' {this.scope = this.scope.push();}
binding_spec[true] ')' body
{this.scope = this.scope.pop();}
| BEGIN sequence
| DO '(' iteration_spec* ')' '(' test do_result? ')' command*
| DELAY expression
)
')'
;
cond_clause
: '(' test (sequence | FOLLOWS recipient)? ')'
;
recipient
: expression
;
case_clause
: '(' '(' datum* ')' sequence ')'
;
binding_spec[boolean sequential]
: {sequential}? // let* or letrec: bind the var immediatly
('(' var=variable
{this.scope.bind($var.text);}
expression ')')*
| {!sequential}? // normal let: bind all vars at the end
('(' vars+=variable expression ')')*
{for (int i = 0; i \less $vars.size(); i++){
String name = ((CommonTree)$vars.get(i)).getText();
this.scope.bind(name);
}
}
;
iteration_spec
: '(' variable init step ')'
;
init
: expression
;
step
: expression
;
do_result
: sequence
;
procedure_call
: '(' operator operand* ')'
;
operator
: expression
;
operand
: expression
;
macro_use
: '(' keyword datum* ')'
;
macro_block
: '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntax_spec*')' body ')'
;
syntax_spec
: '(' keyword transformer_spec')'
;
// TRANSFORMERS
transformer_spec
: '(' SYNTAX_RULES '(' identifier* ')' syntax_rule* ')'
;
syntax_rule
: '(' pattern template ')'
;
pattern
: pattern_identifier
| '(' (pattern+ ('.' pattern)?)? ')'
| '#(' (pattern+ ELLIPSIS?)? ')'
| pattern_datum
;
pattern_datum
: bool
| number
| CHARACTER
| STRING
;
template
: pattern_identifier
| '(' (template_element+ ('.' template)?)? ')'
| '#('template_element* ')'
| template_datum
;
template_element
: template ELLIPSIS?
;
template_datum
: pattern_datum
;
pattern_identifier
: syntactic_keyword
| VARIABLE
;
// external representations
// a Datum is what the _read_ procedure successfully parses.
// Note that any string that parses as an expression will also parse as a datum.
datum
: simple_datum
| compound_datum
;
simple_datum
: bool
| number
| CHARACTER
| STRING
| identifier
;
compound_datum
: list
| vect开发者_Python百科or
;
list
: '(' (datum+ ( '.' datum)?)? ')'
| abbreviation
;
abbreviation
: abbrev_prefix datum
;
abbrev_prefix
: ('\'' | '`' | ',' | ',@')
;
vector
: '#(' datum* ')'
;
// QUASIQUOTATIONS
// CONTEXT-SENSITIVE
quasiquotation
: quasiquotation_D[1]
;
quasiquotation_D[int d]
: '`' qq_template[d]
| '(' QUASIQUOTE qq_template[d] ')'
;
qq_template[int d]
: (expression)=> expression
| ('(' UNQUOTE)=> unquotation[d]
| simple_datum
| vectorQQ_template[d]
| listQQ_template[d]
;
vectorQQ_template[int d]
: '#(' qq_template_or_slice[d]* ')'
;
listQQ_template[int d]
: '\'' qq_template[d]
| ('(' QUASIQUOTE)=> quasiquotation_D[d+1]
| '(' (qq_template_or_slice[d]+ ('.' qq_template[d])?)? ')'
;
unquotation[int d]
: ',' qq_template[d-1]
| '(' UNQUOTE qq_template[d-1] ')'
;
qq_template_or_slice[int d]
: ('(' UNQUOTE_SPLICING)=> splicing_unquotation[d]
| qq_template[d]
;
splicing_unquotation[int d]
: ',@' qq_template[d-1]
| '(' UNQUOTE_SPLICING qq_template[d-1] ')'
;
// values
bool: TRUE | FALSE;
number: NUM_2 | NUM_8 | NUM_10 | NUM_16;
identifier: syntactic_keyword | variable;
variable : VARIABLE | ELLIPSIS;
// KEYWORDS
syntactic_keyword
: expression_keyword
| ELSE
| FOLLOWS
| DEFINE
| UNQUOTE
| UNQUOTE_SPLICING;
expression_keyword
: QUOTE
| LAMBDA
| IF
| SET_BANG
| BEGIN
| COND
| AND
| OR
| CASE
| LET
| LET_STAR
| LETREC
| DO
| DELAY
| QUASIQUOTE;
// syntactic keywords
ELSE : 'else';
FOLLOWS : '=>';
DEFINE : 'define';
UNQUOTE : 'unquote';
UNQUOTE_SPLICING : 'unquote-splicing';
// expression keywords
QUOTE : 'QUOTE';
LAMBDA : 'lambda';
IF : 'if';
SET_BANG : 'set!';
BEGIN : 'begin';
COND : 'cond';
AND : 'and';
OR : 'or';
CASE : 'case';
LET : 'let';
LET_STAR : 'let*';
LETREC : 'letrec';
DO : 'do';
DELAY : 'delay';
QUASIQUOTE : 'quasiquote';
// macro keywords
LETREC_SYNTAX : 'letrec-syntax';
LET_SYNTAX : 'let-syntax';
SYNTAX_RULES : 'syntax_rules';
DEFINE_SYNTAX : 'define-syntax';
ELLIPSIS : '...';
//RESERVED_CHAR : '{'| '}' | '[' | ']' | '|';
STRING : '"' STRING_ELEMENT* '"';
TRUE : '#' ('T' | 't');
FALSE : '#' ('f' | 'F');
CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME);
VARIABLE : INITIAL SUBSEQUENT* | PECULIAR_IDENTIFIER;
// space and comments are ignored
SPACE : (' ' | '\n' | '\t' | '\r') {$channel = HIDDEN;};
COMMENT : ';' ~('\r' | '\n')* {$channel = HIDDEN;};
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment SPECIAL_INITIAL : '!' | '$' | '%' | '&' | '*' | '/' | ':' | '\less' | '=' | '>' | '?' | '^' | '_' | '~';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment SPECIAL_SUBSEQUENT : '+' | '-' | '.' | '@';
fragment PECULIAR_IDENTIFIER : '+' | '-';
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\');
fragment CHARACTER_NAME : 'space' | 'newline';
// NUMBERS
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : 'e' | 'E' | 's' | 'S' | 'f' | 'F' | 'd' | 'D' | 'l' |'L';
fragment SIGN : '+' | '-';
fragment EXACTNESS : '#' ('i' | 'I' | 'e' | 'E');
fragment IMAGINARY : 'i' | 'I';
fragment DIGIT : '0'..'9';
// BINARY NUMBERS
NUM_2 : PREFIX_2 COMPLEX_2;
fragment COMPLEX_2
: REAL_2 ('@' REAL_2)?
| REAL_2? ('+' | '-') UREAL_2? IMAGINARY
;
fragment REAL_2 : SIGN? UREAL_2;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?;
fragment UINTEGER_2 : DIGIT_2+ '#'*;
fragment PREFIX_2
: RADIX_2 EXACTNESS? // #d #i
| EXACTNESS RADIX_2 // #i #d
;
fragment RADIX_2 : '#' ('b' | 'B');
fragment DIGIT_2 : '0' | '1';
// OCTAL NUMBERS
NUM_8 : PREFIX_8 COMPLEX_8;
fragment COMPLEX_8
: REAL_8 ('@' REAL_8)?
| REAL_8? ('+' | '-') UREAL_8? IMAGINARY
;
fragment REAL_8 : SIGN? UREAL_8;
fragment UREAL_8
: UINTEGER_8 ('/' UINTEGER_8)?;
fragment UINTEGER_8 : DIGIT_8+ '#'*;
fragment PREFIX_8
: RADIX_8 EXACTNESS? // #d #i
| EXACTNESS RADIX_8; // #i #d
fragment RADIX_8 : '#' ('o' | 'O');
fragment DIGIT_8 : '0' .. '7';
// DECIMAl NUMBERS
NUM_10 : PREFIX_10? COMPLEX_10;
fragment COMPLEX_10
: REAL_10 ('@' REAL_10)?
| REAL_10? ('+' | '-') UREAL_10? IMAGINARY
;
fragment REAL_10 : SIGN? UREAL_10;
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
fragment UINTEGER_10 : DIGIT+ '#'*;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT+ '#'* SUFFIX?
| DIGIT+ '.' DIGIT* '#'* SUFFIX?
| DIGIT+ '#'+ '.' '#'* SUFFIX?;
fragment PREFIX_10
: RADIX_10 EXACTNESS? // #d #i
| EXACTNESS RADIX_10; // #i #d
fragment RADIX_10 : '#' ('d' | 'D');
// HEXADECIMAL NUMBERS
NUM_16 : PREFIX_16 COMPLEX_16;
fragment COMPLEX_16
: REAL_16 ('@' REAL_16)?
| REAL_16? ('+' | '-') UREAL_16? IMAGINARY
;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_16
: UINTEGER_16 ('/' UINTEGER_16)?;
fragment UINTEGER_16 : DIGIT_16+ '#'*;
fragment PREFIX_16
: RADIX_16 EXACTNESS? // #d #i
| EXACTNESS RADIX_16; // #i #d
fragment RADIX_16 : '#' ('x' | 'X');
fragment DIGIT_16 : DIGIT | 'a'.. 'f' | 'A' .. 'F';
(i had to replace "<" with "\less" in order to make the formatation work)
EDIT the solution to this problem was far simpler: (define x) is (surprisingly valid in r5rs (see last comment)
There are many ways to improve error reporting. A quick fix would be to override emitErrorMessage(String message)
in the parser class and simply throw an exception with the provided message:
grammar T;
@members {
@Override
public void emitErrorMessage(String message) {
throw new RuntimeException(message);
}
}
definition
: '(' 'define' ( '(' variable def_formals ')' body ')'
| variable expression ')'
)
;
def_formals
: variable* ('.' variable)?
;
body
: ((definition)=> definition)* expression+
;
expression
: INT
;
variable
: ID
;
ID : 'a'..'z'+;
INT : '0'..'9';
SPACE : ' ' {skip();};
which you can test with the class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) {
String[] tests = {
"(define x 5)",
"(define x 5))",
"(define x)",
"(define)"
};
for(String input : tests) {
TLexer lexer = new TLexer(new ANTLRStringStream(input));
TParser parser = new TParser(new CommonTokenStream(lexer));
System.out.println("\nParsing : " + input);
try {
parser.definition();
} catch(Exception e) {
System.out.println(" exception -> " + e.getMessage());
}
}
}
}
After running the class above, you will see the following:
bart@hades:~/Programming/ANTLR/Demos/T$ java -cp antlr-3.3.jar org.antlr.Tool T.g
bart@hades:~/Programming/ANTLR/Demos/T$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/T$ java -cp .:antlr-3.3.jar Main
Parsing : (define x 5)
Parsing : (define x 5))
Parsing : (define x)
exception -> line 1:9 missing INT at ')'
Parsing : (define)
exception -> line 1:7 no viable alternative at input ')'
As you can see, the input (define x 5))
produces no exception! That is because the lexer has no problems with it (they're all valid tokens) and the parser is simply instructed to consume the definition
rule:
definition
: '(' 'define' ( '(' variable def_formals ')' body ')'
| variable expression ')'
)
;
which it does. If you wanted an error because of the dangling ')'
, then you'd have the add the EOF
token at the end of the rule:
definition
: '(' 'define' ( '(' variable def_formals ')' body ')'
| variable expression ')'
)
EOF
;
精彩评论