Why are forward declarations necessary? [duplicate]_问答_开发者

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
Should C++ eliminate header files?

In languages like C# and Java th开发者_运维技巧ere is no need to declare (for example) a class before using it. If I understand it correctly this is because the compiler does two passes on the code. In the first it just "collects the information available" and in the second one it checks that the code is correct.

In C and C++ the compiler does only one pass so everything needs to be available at that time.

So my question basically is why isn't it done this way in C and C++. Wouldn't it eliminate the needs for header files?

The short answer is that computing power and resources advanced exponentially between the time that C was defined and the time that Java came along 25 years later.

The longer answer...

The maximum size of a compilation unit -- the block of code that a compiler processes in a single chunk -- is going to be limited by the amount of memory that the compiling computer has. In order to process the symbols that you type into machine code, the compiler needs to hold all the symbols in a lookup table and reference them as it comes across them in your code.

When C was created in 1972, computing resources were much more scarce and at a high premium -- the memory required to store a complex program's entire symbolic table at once simply wasn't available in most systems. Fixed storage was also expensive, and extremely slow, so ideas like virtual memory or storing parts of the symbolic table on disk simply wouldn't have allowed compilation in a reasonable timeframe.

The best solution to the problem was to chunk the code into smaller pieces by having a human sort out which portions of the symbol table would be needed in which compilation units ahead of time. Imposing a fairly small task on the programmer of declaring what he would use saved the tremendous effort of having the computer search the entire program for anything the programmer could use.

It also saved the compiler from having to make two passes on every source file: the first one to index all the symbols inside, and the second to parse the references and look them up. When you're dealing with magnetic tape where seek times were measured in seconds and read throughput was measured in bytes per second (not kilobytes or megabytes), that was pretty meaningful.

C++, while created almost 17 years later, was defined as a superset of C, and therefore had to use the same mechanism.

By the time Java rolled around in 1995, average computers had enough memory that holding a symbolic table, even for a complex project, was no longer a substantial burden. And Java wasn't designed to be backwards-compatible with C, so it had no need to adopt a legacy mechanism. C# was similarly unencumbered.

As a result, their designers chose to shift the burden of compartmentalizing symbolic declaration back off the programmer and put it on the computer again, since its cost in proportion to the total effort of compilation was minimal.

Bottom line: there have been advances in compiler technology that make forward declarations unnecessary. Plus computers are thousands of times faster, and so can make the extra calculations necessary to handle the lack of forward declarations.

C and C++ are older and were standardized at a time when it was necessary to save every CPU cycle.

No, it would not obviate header files. It would eliminate the requirement to use a header to declare classes/functions in the same file. The major reason for headers is not to declare things in the same file though. The primary reason for headers is to declare things that are defined in other files.

For better or worse, the rules for the semantics of C (and C++) mandate the "single pass" style behavior. Just for example, consider code like this:

int i;

int f() { 
     i = 1;
     int i = 2;
}

The i=1 assigns to the global, not the one defined inside of f(). This is because at the point of the assignment, the local definition of i hasn't been seen yet so it isn't taken into account. You could still follow these rules with a two-pass compiler, but doing so could be non-trivial. I haven't checked their specs to know with certainty, but my immediate guess would be that Java and C# differ from C and C++ in this respect.

Edit: Since a comment said my guess was incorrect, I did a bit of checking. According to the Java Language Reference, §14.4.2, Java seems to follow pretty close to the same rules as C++ (a little different, but not a whole lot.

At least as I read the C# language specification, (warning: Word file) however, it is different. It (§3.7.1) says: "The scope of a local variable declared in a local-variable-declaration (§8.5.1) is the block in which the declaration occurs."

This appears to say that in C#, the local variable should be visible throughout the entire block in which it is declared, so with code similar to the example I gave, the assignment would be to the local variable, not the global.

So, my guess was half right: Java follows (pretty much0 the same rule as C++ in this respect, but C# does not.

This is because of smaller compilation modules in C/C++. In C/C++, each .c/.cpp file is compiled separately, creating an .obj module. Thus the compiler needs the information about types and variables, declared in other compilation modules. This information is supplied in form of forward declarations, usually in header files.

C#, on the other side, compiles several .cs files into one big compilation module at once.

In fact, when referencing different compiled modules from a C# program, the compiler needs to know the declarations (type names etc.) the same way as C++ compiler does. This information is obtained from the compiled module directly. In C++ the same information is explicitly separated (that's why you cannot find out the variable names from C++-compiled DLL, but can determine it from .NET assembly).

The forward declarations in C++ are a way to provide metadata about the other pieces of code that might be used by the currently compiled source to the compiler, so it can generate the correct code.

That metadata can come from the author of the linked library/component. However, it can also be automatically generated (for example there are tools that generate C++ header files for COM objects). In any case, the C++ way of expressing that metadata is through the header files you need to include in your source code.

The C#/.Net also consume similar metadata at compile time. However, that metadata is automatically generated when the assembly it applies to is built and is usually embedded into it. Thus, when you reference in your C# project an assembly, you are essentially telling the compiler "look for the metadata you need in this assembly as well, please".

In other words, the metadata generation and consumption in C# is more transparent to the developers, allowing them to focus on what really matters - writing their own code.

There are also other benefits to having the metadata about the code bundled with the assembly as well. Reflection, code emitting, on-the-fly serialization - they all depend on the metadata to be able to generate the proper code at run-time.

The C++ analogue to this would be RTTI, although it's not widely-adopted due ot incompatible implementations.

From Eric Lippert, blogger of all things internal to C#: http://blogs.msdn.com/ericlippert/archive/2010/02/04/how-many-passes.aspx:

The C# language does not require that declarations occur before usages, which has two impacts, again, on the user and on the compiler writer. [...]

The impact on the compiler writer is that we have to have a “two pass” compiler. In the first pass, we look for declarations and ignore bodies. Once we have gleaned all the information from the declarations that we would have got from the headers in C++, we take a second pass over the code and generate the IL for the bodies.

To sum up, using something does not require declaring it in C#, whereas it does in C++. That means that in C++, you need to explicitly declare things, and it's more convenient and safe to do that with header files so you don't violate the One Definition Rule.