I'm trying to parse a legacy language (which is similar to 'C') using FLEX and BISON. Everything is working nicely except for matching strings.
This rather odd legacy language doesn't support quoting characters in string literals, so the following are all valid string literals:
"hello"
""
"\"
I'm using the following rule to match string literals:
\".*\" { yylval.strv开发者_StackOverflow中文版al = _strdup( yytext ); return LIT_STRING; }
Unfortunately this is a greedy match, so it matches code like the following:
"hello", "world"
As a single string (hello", "world
).
The usual non-greedy quantifier .*?
doesn't seem to work in FLEX. Any ideas?
Just prohibit having a quote in between the quotes.
\"[^"]*\"
Backslash escaped quotes
The following also allows it:
\"(\\.|[^\n"\\])*\" {
fprintf( yyout, "STRING: %s\n", yytext );
}
and disallows for newlines inside of string constants.
E.g.:
>>> "a\"b""c\d"""
STRING: "a\"b"
STRING: "c\d"
STRING: ""
and fails on:
>>> "\"
When implementing such C-like features, make sure to look for existing Lex implementations, e.g.: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
精彩评论