I'm working on a lexer for the Python grammar (written in Flex) for a compiler construction class and I'm having trouble getting a properly working regular expression to catch when there is no white space at the beginning of a l开发者_运维问答ine (to account for the end of an indented block).
The rule checking for no indentation appears after those checking for comments, blank lines, and indentation. It is also before rules checking for anything else. Here's what it looks like right now:
<INITIAL>^[^ \t] {
printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);
/* Dedent to 0 if not mid-expression */
if(!lineJoin && bracketDepth() == 0)
changeIndent(0);
/* Treat line as normal */
REJECT;
}
As I understand it, the rule above should output that debug line for any line in the lexed file that has actual python code but doesn't start with indentation. However, as it stands now, very few lines in my many text cases display it.
For example, the debug output appears nowhere for this test case (it also misses the dedent entirely on line 4):
myList = [1,2,3,4]
for index in range(len(myList)):
myList[index] += 1
print( myList )
but appears for every line in this one:
a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1
Given that most of the other rules work properly, I'm led to believe that the regex is the problem in the above rule but I don't see why this one is failing most of the time. Does anyone have any insight?
I don't know flex, but I observe that each sample that worked is a single character, while each one that didn't work is not. Perhaps flex is matching against entire tokens instead of single characters? You might try adding a +
after the character class.
精彩评论