I am reading the dragon book. Quoting the text from the book (3.1.4 Lexical errors, Pno 114)
It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. For instance, if the string
fi
is encountered for the first time in a C program in the context:fi ( a == f(x) ) ...
a lexical analyzer cannot tell whether
fi
is a misspelling of the keywordif
or an undeclared function identifier. Sincefi
is a valid lexeme for the token id, the lexical analyzer must return the tokenid
to the parser and let some other phase of the compiler - probably the parser in this case - handle an error due to transposition of the letters.
I am bit confused after reading this. My understanding was lexical analyser starts processing the text from l开发者_Python百科eft to right and return tokens whenever the pattern matches. So for a language where if
is the keyword to match, how can fi
match?
Any thoughts?
It doesn't match the if
token, but the id
token, which stands for "identifier". It's the catch-all if no keyword matches. The lexical analyser doesn't know what to "expect" at certain positions. It just returns tokens, and the parser will know what it expects. A C parser has to accept the following statement, for example, which is a function call
fi ( a == f(x) );
You must make a distinction between syntax analysis and lexical analysis.
The task of lexical analysis is to convert a sequence of characters into a string of tokens. There can be various types of tokens, ex IDENTIFIER, ADDITION OPERATOR, END OF STATEMENT OPERATOR, etc. Lexical analysis can only fail with an error if it encounters a string of text which doesn't correspond to any token. In your case
fi ( a == f(x) ) ...
would translate to<IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET>
.....Once a string of tokens have been generated, syntax analysis is performed. This typically involves constructing some sort of syntax tree from the tokens. The parser is aware of all the forms of valid statements that are allowed in the language. If the parser cannot find a syntax rule allowing the above sequence of tokens, it will fail.
How would you tell if if
was the only expected input at a given point?
int a = 42;
if (a == 42)
puts("ok");
vs.
int a = 42;
fi (a == 42)
puts("ok");
fi
could be a function call. For example, the above could be a mis-spelling of:
int a = 42;
fi(a == 42);
puts("ok");
where fi
is a function taking int
and returning void
.
This is a poor choice of example for a lexical analysis error explanation. What this text tries to tell you is, that the compiler cannot recognize you misspelled the "if" keyword (wrote it backwards). It just sees "fi" which is for example a valid variable name and so returns the id (for example) "VARIABLE" to the parser. The parser then later realizes the syntax error.
It has nothing to do with going left-to-right or right-to-left. The compiler of course reads the source code from left-to-right. As I said - a poor choice of keyword for this explanation.
精彩评论