Parse list with minimal separators_问答_开发者

开发者 https://www.devze.com 2023-01-31 19:21 出处：网络

I have a language with statements of 4 kinds: s00, s01, s10, s11 where a leading 1 means initial keyword, a trailing 1 means terminated, and I have a separator \";\". I can terminate any statement wit

I have a language with statements of 4 kinds: s00, s01, s10, s11 where a leading 1 means initial keyword, a trailing 1 means terminated, and I have a separator ";". I can terminate any statement with ";". I would like to parse a language allowing a list of statements which allows minimal use of ";". The parser is Dypgen which is GLR+.

Example:

{ x=1 fun f(){} x=1; x=1 var x=1 var x=1; x=1 }

Is it possible to do this at all? If so, how? If not, why?

I believe it can't be done, mainly because I can't think of how to do it :) However it does seem context sensitive: the rule is you have to insert a ";" between A and B if A is not terminated and B is not initiated, ditto for B and C which means B is used twice.

However because the parser is GLR+ it is tempting to just use

(s00|s01|s10|s11}*

as the rule, and if it misparses throw in a ";" (which is an s11 no-op) to resolve the ambiguity. It would be nicer if the parser would report a syntax error though. Perhaps this could be done when merging alternate s productions. The real problem is whe开发者_运维技巧n they overlap instead of merging: if this occurs a program parse could explode.

I've recently had a similar problem with toplevel phrases , some of them needing a terminating ;; in the previous phrase, and others (beginning with a phrase-introducing keyword) not. I've solved my problem by splitting the syntactic category of phrases in two, and giving fine rules to phrase sequences expressing this behaviour. But this resulted in duplication in the splitted grammar.

In your case it would be something like :

sequence:
  | (s00 | s10) sequence_closed
  | (s01 | s11) sequence_open
  | ε

sequence_closed:
  | s10 sequence_closed
  | s11 sequence_open
  | ';' sequence_open
  | ε

sequence_open:
  | s00 sequence_closed
  | s01 sequence_open
  | ε

It's a bit more complicated if you want to allow superfluous delimiters (and you most probably want to), but that's the idea.