Question on lexical analysis_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-06 21:08 出处：网络

I am reading the dragon book. Quoting the text from the book (3.1.4 Lexical errors, Pno 114) It is hard for a lexical analyzer to

I am reading the dragon book. Quoting the text from the book (3.1.4 Lexical errors, Pno 114)

It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. For instance, if the string fi is encountered for the first time in a C program in the context:
fi ( a == f(x) ) ...
a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. Since fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler - probably the parser in this case - handle an error due to transposition of the letters.

I am bit confused after reading this. My understanding was lexical analyser starts processing the text from l开发者_Python百科eft to right and return tokens whenever the pattern matches. So for a language where if is the keyword to match, how can fi match?

Any thoughts?

It doesn't match the if token, but the id token, which stands for "identifier". It's the catch-all if no keyword matches. The lexical analyser doesn't know what to "expect" at certain positions. It just returns tokens, and the parser will know what it expects. A C parser has to accept the following statement, for example, which is a function call

fi ( a  == f(x) );

You must make a distinction between syntax analysis and lexical analysis.

The task of lexical analysis is to convert a sequence of characters into a string of tokens. There can be various types of tokens, ex IDENTIFIER, ADDITION OPERATOR, END OF STATEMENT OPERATOR, etc. Lexical analysis can only fail with an error if it encounters a string of text which doesn't correspond to any token. In your case fi ( a == f(x) ) ... would translate to <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET> .....
Once a string of tokens have been generated, syntax analysis is performed. This typically involves constructing some sort of syntax tree from the tokens. The parser is aware of all the forms of valid statements that are allowed in the language. If the parser cannot find a syntax rule allowing the above sequence of tokens, it will fail.

How would you tell if if was the only expected input at a given point?

int a = 42;
if (a == 42)
    puts("ok");

vs.

int a = 42;
fi (a == 42)
    puts("ok");

fi could be a function call. For example, the above could be a mis-spelling of:

int a = 42;
fi(a == 42);
puts("ok");

where fi is a function taking int and returning void.

This is a poor choice of example for a lexical analysis error explanation. What this text tries to tell you is, that the compiler cannot recognize you misspelled the "if" keyword (wrote it backwards). It just sees "fi" which is for example a valid variable name and so returns the id (for example) "VARIABLE" to the parser. The parser then later realizes the syntax error.

It has nothing to do with going left-to-right or right-to-left. The compiler of course reads the source code from left-to-right. As I said - a poor choice of keyword for this explanation.