I am trying to determine whether suggested changes to the EcmaScript grammar introduce ambiguities.
The grammar is odd in a few ways
- There is no regular or context free lexical grammar meaning there is no way to break the input into a series of tokens which can be fed to a tree builder, though at a given parser state there is a context free grammar which can be used to fetch the next token.
- Some tokens are implicit. Specifically semicolons are inserted in some places when not present in the source text. This only requires one non-ignorable token of lookahead but since ignorable tokens can be of arbitrary length prevents non-finite lookahead.
- There is no translation simpler than a full parse that allows removal or collapsing of ignorable tokens.
- Line terminators tokens (and multiline comments that are equivalent to line terminators) are ignorable in most contexts but are significant in some.
I know that proving no ambiguity is not doable in general, but I'd like to be able to achieve a simpler goal:
A test that is true if and only if there is no string such that two different paths through the candidate grammar might produce two different trees where each path involves breaking the string into less than k tokens.
I would be very happy if I could prove such a thing for a candidate grammar to k of 50.
Is there any literature on detecting开发者_Go百科 ambiguity within such limits?
精彩评论