I have defined simple grammar for parsing string and number using Treetop as below.
grammar Simple
rule value
number / string
end
rule string
word space string
/
word
end
rule word
[0-9a-zA-Z]+
end
rule number
[1-9] [0-9]*
end
rule space
' '+
end
end
Ruby:
parser = SimpleParser.new
parser.parse('123abc wer') # => nil
I expect the parser to retu开发者_Go百科rn string node but look like the parser could not understand the input. Any idea would be appreciated.
In Treetop (and PEGs in general, actually) the choice operator is ordered, unlike most other parsing formalisms.
So, in
rule value
number / string
end
you are telling Treetop that you prefer number
over string
.
Your input starts with 1
, which matches both number
and string
(through word
), but you told Treetop to prefer the number
interpretation, so it parses it as a number
. When it comes to the a
in the input, it has no more rules to apply, and thus it returns nothing (nil
), because in Treetop it is an error to not consume the entire input stream.
If you simply reverse the order of the choice, the entire input will interpreted as a string
instead of a number
:
SyntaxNode+String0 offset=0, "123abc wer" (word,space,string):
SyntaxNode offset=0, "123abc":
SyntaxNode offset=0, "1"
SyntaxNode offset=1, "2"
SyntaxNode offset=2, "3"
SyntaxNode offset=3, "a"
SyntaxNode offset=4, "b"
SyntaxNode offset=5, "c"
SyntaxNode offset=6, " ":
SyntaxNode offset=6, " "
SyntaxNode offset=7, "wer":
SyntaxNode offset=7, "w"
SyntaxNode offset=8, "e"
SyntaxNode offset=9, "r"
Or, you could keep the order as it is, but allow the value
rule to be matched multiple times. Either insert a new top-level rule like this:
rule values
value+
end
or modify the value
rule like this:
rule value
(number / string)+
end
Which will give you an AST roughly like this:
SyntaxNode offset=0, "123abc wer":
SyntaxNode+Number0 offset=0, "123":
SyntaxNode offset=0, "1"
SyntaxNode offset=1, "23":
SyntaxNode offset=1, "2"
SyntaxNode offset=2, "3"
SyntaxNode+String0 offset=3, "abc wer" (word,space,string):
SyntaxNode offset=3, "abc":
SyntaxNode offset=3, "a"
SyntaxNode offset=4, "b"
SyntaxNode offset=5, "c"
SyntaxNode offset=6, " ":
SyntaxNode offset=6, " "
SyntaxNode offset=7, "wer":
SyntaxNode offset=7, "w"
SyntaxNode offset=8, "e"
SyntaxNode offset=9, "r"
精彩评论