This question popped into my head today at work when I was having yet another domestic affair with my compiler. Despite my buff pinky (due to all the semicolon pressing I do at work), I managed to miss one before an if
statement. Obviously, this resulted in a compile error:
error C2143: syntax error : missing ';' before 'if'
So I wondered "well gee, why can't you tell me the line that's missing the semicolon instead of the line after the problem." and I proceeded to experiment with other similar syntax errors:
error C2065: 'myUndeclared' : undeclared identifier
error C2143: syntax error : missing ')' before 'if'开发者_如何学编程
etc...
Now, all of those errors would, similarly, take me to the line after the problem and complain about something before the if
statement.
Consider the following:
SomeFunction(x) //Notice, there is no ';' here
if(bSomeCondition)
{
...
}
I get two compile errors:
(Line 265) error C2065: 'x' : undeclared identifier
(Line 266) error C2143: syntax error : missing ';' before 'if'
However, the first error correctly tells me the line number, despite the missing semicolon. This suggests to me that the compiler doesn't get tripped up in parsing and is able to make it past the semicolon problem. So, why is it that the compiler insists on grammatical errors being reported in this way? Other errors (non grammatical) are reported on the lines they are found. Does this have to do with the compiler making multiple passes? Basically, I hope someone with a working knowledge of the C++ compiler might explain specifically what the compiler is doing that necessitates the reporting of errors in this "before" way.
The short answer to the more general question of "Why do C/C++ error messages suck" is "Sometimes C++ is really hard to parse" (it doesn't actually have a context free grammar). However, this isn't really a valid reason - one can still make tools that record better diagnostic information than most C++ compilers.
The more practical answer is "Compiler authors have inherited legacy codebases which didn't value error messages", combined with a mild dose of "compiler authors are lazy", topped with "Diagnostic reporting isn't an exciting problem". Most compiler writers would add a new language feature or 3% codegen performance improvement, rather than do significant refactoring on the codebase to allow decent error reporting. The specific question about "Why aren't errors properly localised to the line that 'caused' them" is an instance of this. There's not really a technical reason compilers can't generally work out that a ;
is missing
, and then tell you about the source span of the last ;
lacking statement - even in the presence of C++'s general whitespace invariance. It's just that storing that information has (largely) been historically ignored.
That said, new compilers not hampered by decades of old code are doing much better. Have a look at the Clang compiler, which prides itself on sensible error messages. The page on diagnostics shows how much better than GCC they are. An example for this case being:
$ gcc-4.2 t.c
t.c: In function 'foo':
t.c:5: error: expected ';' before '}' token
$ clang t.c
t.c:4:8: error: expected ';' after expression
bar()
^
;
Or, more impressively:
$ cat t.cc
template<class T>
class a {}
class temp {};
a<temp> b;
struct b {
}
$ gcc-4.2 t.cc
t.cc:3: error: multiple types in one declaration
t.cc:4: error: non-template type 'a' used as a template
t.cc:4: error: invalid type in declaration before ';' token
t.cc:6: error: expected unqualified-id at end of input
$ clang t.cc
t.cc:2:11: error: expected ';' after class
class a {}
^
;
t.cc:6:2: error: expected ';' after struct
}
^
;
Look, it's even telling us what to type where to fix the problem! </clang_salespitch>
Because in C++, white-space doesn't matter, on the whole. So this is valid code:
SomeFunction(x)
;if(bSomeCondition)
{
...
}
So the compiler message is simply reporting that a semi-colon hasn't appeared somewhere before the if
.
In this code:
SomeFunction(x)
if (y) {
}
As you said, the error would be reported on line 2 as missing ';' before 'if'
.
There is not wrong with line 1. It's perfectly valid without a semi-colon, and several expressions are possible besides just a semi-colon (such as a dot, or a math operator, or assignment, or a pointer, etc).
So, reporting the error on the previous line may not always make sense, take this example:
SomeFunction(x)
+= 10
- 5
// blank line
// blank line
if (y) {
}
Which line has the error? The line with the - 5
? Or one of the comment lines? To the compiler, the error is actually with the 'if', since it is the first place that something can be detected as being wrong. To report a different line, the compiler would have to report the last properly parsed token as the error, rather than the first place the error is detected. That sounds a little backwards, and saying that //blank line1
is missing a semi-colon is even more confusing, since changing it to //blank line;
would of course not change or fix the error.
By the way, this is not unique to C or C++. This is a common way to report errors in most parsers.
Quite simply, because of how parsing is done. When the parser expects ;
, and instead encounters if
, the error is in the if
. The simplest sane way to report it is to say ;
was expected before if
.
The compiler is white space agnostic. It doesn't know (or care) that there is a carriage return or tabs or spaces in between your statements. all it cares about is what is after or before semi colons, or after/before brackets ('{','}') which end and begin classes and functions. That's why :)
Because when it's done parsing that line it does not know that you wanted a semicolon there yet. Let's look at an example:
int mystuff
Is this line missing a semicolon? That depends on what comes next. For instance the following construct is perfectly ok:
int mystuff
= 1;
I would never write it like that, but for the compiler it is ok.
Because, the following code would be correct:
SomeFunction(x)
;if(bSomeCondition)
{
}
That's because unecessary whitespaces are ignored.
The short answer: you can put ;
into line 266, and then it's going to be fine. From the compiler's perspective the error is there.
You might want to try clang
, although I don't know whether it gives better error message for this particular type of error but in general it gives a lot clearer error messages.
It is because compiler checks for 1 whole statement. Let me give an example:
int a,b,c
c=a+b;
cout<<c;
This code generates compilation error, that "; is expected before c/line 2", this happens because compiler first looks at line 1's int a,b,c and the compiler has no clue whether there will be any other variable or statement and so the compiler moves to second line(because whitespaces are allowed), and then it sees that there is "c=a+b", which is a statement, and thus compiler knows that something is wrong, as it was expecting either a variable or a semicolon(;). And so, it tells us that it was expecting a ; before a statement.
So long story short, compiler doesn't look for a semicolon after a statement(if that was the case we might not be able to use whitespaces in our codes), it looks for ; right before the other statement, because compiler has no clue how long the first statement would be.
精彩评论