开发者

Understanding C++ Compilation

开发者 https://www.devze.com 2023-02-18 14:54 出处:网络
I have recently become aware that I have no idea, genericly speaking, how a c/c++ compiler works.I will admit this initialy came from trying to understand header guards but came to the realization tha

I have recently become aware that I have no idea, genericly speaking, how a c/c++ compiler works. I will admit this initialy came from trying to understand header guards but came to the realization that I am lacking in how compiling works.

Take Visual C++ for instance; Theres the "Header Files" folder, the "Resources Files" folder, and "Source Files" folder. Is there any significance to the separation of these folders and what you put in them? To me, they are all source files. Take the code snippets:

Snippet 1

//a1.h
int r=4;

and

//a1.cpp
int b  //<--semicolon left out on purpose

and

//main.cpp
#include <iostream>
#include "a开发者_JS百科1.h"
void main()
{
   cout << r;
}

The compiler errors out saying "a1.cpp(3) : fatal error C1004: unexpected end-of-file found" where I would expect it wouldn't because the a1.cpp file is not #included where the main method exists where in the next code snippet

Snippet 2

//a1.h
int r=4 //<--semicolon left out on purpose

and

//a1.cpp
int b = 4;  

and

//main.cpp
#include <iostream>
void main()
{
   cout << b;
}

Errors out because "main.cpp(6) : error C2065: 'b' : undeclared identifier". If you include the a1.cpp like so

Snippet 3

//a1.h
int r=4 //<--semicolon left out on purpose

and

//a1.cpp
int b = 4;  

and

//main.cpp
#include <iostream>
#include "a1.cpp"
void main()
{
   cout << b;
}

the compiler complains "a1.obj : error LNK2005: "int b" (?b@@3HA) already defined in main.obj". Both snippets 2 and 3 ignore the fact that int r = 4 does not have a semicolon missing as I suspect that it has something to do with its a xxxx.h file. If I remove the a1.cpp file from the project on snippet 1, then it compiles fine. Clearly what I have expected is not what I am getting. Theres plenty of books and tutorials on how to code in cpp, but not much in the way cpp handles files and source code in the complition process. What on earth is going on here?


Your questions aren't really about the compiler, but about how your IDE is handling the entire build system. The build systems for most C/C++ projects compile each .c or .cpp file separately, and then link the resulting object files together into a final executable. In your case, your IDE is compiling any file you have in the project with a filename extension of .cpp and then linking the resulting objects. The behaviour you're seeing can be explained as follows:

  1. a1.cpp is missing a ;, so when the IDE tries to compile that file, you get the error about 'unexpected end of file'.

  2. b isn't declared anywhere in the main.cpp compilation unit, so you get an error about an undefined identifier.

  3. b exists in both the main.cpp and a1.cpp compilation units (obviously in a1.cpp, and via your #include for main.cpp). Your IDE compiles both of those files - now a1.o and main.o each contain an object called b. When linking, you get a duplicate symbol error.

The important point to take away here, which explains all of the behaviour you see, is that your IDE compiles every .cpp file - not just main.cpp and the files it includes - and then links the resulting objects.

I recommend setting up a command-line test project with a makefile you create yourself - that will teach you all about the inner workings of build systems, and you can then apply that knowledge to the inner workings of your IDE.


  1. header files are not compiled
  2. an #include directive literally pastes the contents of the includable file instead of the #include line
  3. All source files (redargless of main) are compiled into .o or .obj files.
  4. All obj files are linked together along with external .lib files if there are any
  5. You get an executable.

Regarding point 2:

example
//a.h
int

//b.h
x = 

//c.h
5

//main.cpp
#include <iostream>
int main()
{
#include "a.h"
#include "b.h"
#include "c.h"
;

std::cout << x << std::endl; //prints 5 :) 
}

This isn't a full answer, but hth, my2c, etc :)


Since it seems that there are two ways of understanding your question, I will answer to the understanding C++ compilation part.

I suggest that you start by reading the "compiler" definition in Wikipedia. After that, try Google search for compiler tutorials to build up your comprenhension about compilers. More specific to C++, you can read about #include and preprocessor directives (try Google search for those terms.)

If you still want to understand compilers further, I suggest a compiler book. You'll find a good list of books on StackOverflow.


The #include statement inserts that file into the file making the #include. Your snippet 3 main.cpp thus becomes the following before compilation.

    // main.cpp    
    // All sorts of stuff from iostream
    //a1.cpp
    int b = 4;
    void main()
    {
        cout << b;
    }

The reason you are getting a linker error is that you are defining b twice. It is defined in a.cpp and in main.cpp.

You may wish to read about declaring and defining.


You tell the build system what files to compile. In the case of Visual C++, it will automatically compile any file named "*.cpp" that you add to the project. Though you can go into the project settings and tell it not to.

It will not compile files named *.h (though it can if you explicity tell it to.

The #include directive, is something the compiler processes before it does any compilation (it's called the pre-processor). It basically takes the file that it is pointing it to and sticks it in the source file being compiled at the point the #include directive appears in the file. The compiler then compiles that whole thing as one complete unit.

So, in your example cases:

Snippet 1

Bote a1.cpp and main.cpp are compiled seperately by the build system. SO, when it encounters the error om a1.cpp, it reports it.

Snippet 2

Note that it compiles these files seperately, with no knowledge of each other, so when your reference b in main.cpp, it does not know that b is defined in a1.cpp.

Snippet 3

Now you've included a1.cpp in main.cpp, so it compiles main.cpp, sees a definition for b and says, OK I have a b at global scope. Then it compiles a1.cpp, and says OK, I have a b at global scope.

Now the linker steps in and =tries to put a1 and main together, it;'s now telling you, hey I have 2 b's ate global scope. No good.


The compiler picks up source files from where you tell it to. In the case of Visual C++ there's an IDE telling the compiler what to do, and the different folders are there because that's how the IDE organises the files.

Also, the error in snippet 2 is from the linker, not from the compiler. The compiler has compiled main.cpp and a1.cpp into object files main.obj and a1.obj and then the linker is trying to make an executable combining these object files, but the variable b is in both a1.obj (directly) and main.obj (via the include of a1.cpp), so you get the "already defined" error.


The problems you see in case 1 and 3 are VS specific. VS apparently tries to compile both main.cpp and a1.cpp.

Case 1: As VS tries to compile a1.cpp, which has an syntax error (the missing semicolon), the compilation fails.

Case 2: You have not declared the variable b in your main.cpp or in any included files. Thus the compilation fails.

Case 3: This is a linker error. Due to the include, int b has been declared in main.cpp as well as in a1.cpp. Since none of them is either static or extern, two global variables with the same identifier have been declared in the same scope. This is no allowed.

0

精彩评论

暂无评论...
验证码 换一张
取 消