开发者

I believe this should be one rule in Treetop

开发者 https://www.devze.com 2023-03-19 18:32 出处:网络
I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:

I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:

rule _
  crap
  /
  " "*
end

rule crap
  " "* "\\x0D\\x0A"* " "*
end

I'm parsing some expressions that开发者_如何学编程 every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.

That rule works, but it's ugly and it bothers me. I tried this:

rule _
  " "* "\\x0D\\x0A"* " "*
  /
  " "*
end

which caused

SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'

Ideally I would like to actually write something like:

rule _
  (" " | "\\x0D\\x0A")*
end

but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:

rule _
  " "*
  /
  "\n"*
end

that will match " ", but never \n.


I see you're using three different OR chars: /, | and \ (of which only the first means OR).

This works fine:

grammar Language

  rule crap
    (" " / "\\x0D\\x0A")* {
      def value
        text_value    
      end
    }
  end

end
#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'

parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A   \\x0D\\x0A   ').value
print '>' + value + '<'

prints:

> \x0D\x0A   \x0D\x0A   <


You said "I also discovered that you can't have only one * per rule" (you mean: you CAN have), "that will match " ", but never \n".

Of course; the rule succeeds when it matches zero space characters. You could just use a + instead:

rule _
  " "+
  /
  "\n"*
end

You could also parenthesise the space characters if you want to match any number of space-or-newline characters:

rule _
  (" " / "\n")*
end

Your error "class/module name must be CONSTANT" is because the rule name is used as the prefix of a module name to contain any methods attached to your rule. A module name may not begin with an underscore, so you can't use methods in a rule whose name begins with an underscore.

0

精彩评论

暂无评论...
验证码 换一张
取 消