Example String
023abc7defghij
Header
Characters 0, 1 = Size of following chunks
Chunks
First character = length of following string String
Following characters = String with the specified length
Example result
So in the upper example this would mean:
02 -> 2 following chunks
3 -> 3 character String will follow
abc -> the thr开发者_如何学运维ee character string
7 -> 7 character String will follow
defghij -> the seven character string
Question
Can I write a grammar, that describes this form of a string? I would need to interpret the 'length' informations and then build tokens with the specified lenght to fill my objects with the length informations and the strings.
I hope I could describe this comprehensible. I could not find information, describing or solving my problem.
I'm assuming your actual problem is a bit more complicated, because if "023abc7defghij"
is your actual input, I wouldn't use a parser generator like ANTLR, but just stick with some simple string-operations.
That said, here's a possible solution:
Since your chunks
are not known up front, you cannot create any tokens other than a single Digit
and an Other
token that would be any char other than a digit. Note that you don't really need the header
information: you simply parse "3"
and then get the next 3 chars, then parse the "7"
and get the next 7 chars, ... all the way up to the end of the file.
A grammar for such a language could look like this:
grammar T;
parse
: file EOF
;
file
: header chunk*
;
header
: Digit Digit
;
chunk
: Digit any*
;
any
: Digit
| Other
;
Digit
: '0'..'9'
;
Other
: .
;
But now the chunk
rule is ambiguous: it does not now when to stop consuming characters. This can be done using a gated semantic predicate that will cause the *
from any*
to stop consuming when a certain condition has been met (when a counter int n
has been counted down, in this case).
The grammar above including this predicate and some println
-statements would look like this:
grammar T;
parse
: file EOF
;
file
: header {System.out.println("header=" + $header.text);}
(chunk {System.out.println("chunk=" + $chunk.text);})*
;
header
: Digit Digit
;
chunk
: Digit {int n = Integer.valueOf($Digit.text);} ({n > 0}?=> any {n--;})*
;
any
: Digit
| Other
;
Digit
: '0'..'9'
;
Other
: .
;
which can be tested with the class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "023abc7defghij";
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
If you now generate a lexer and parser, compile all .java
file and run the Main
class:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
you would see the following being printed to your console:
header=02
chunk=3abc
chunk=7defghij
精彩评论