A colleague recently revealed to me that a single source file of ours includes over 3,400 headers during compile time. We have over 1,000 translation units that get compiled in a build, resulting in a huge performance penalty over headers that surely aren't all used.
Are 开发者_运维知识库there any static analysis tools that would be able to shed light on the trees in such a forest, specifically giving us the ability to decide which ones we should work on paring out?
UPDATE
Found some interesting information on the cost of including a header file (and the types of include guards to optimize its inclusion) here, originating from this question.
The output of gcc -w -H <file>
might be useful (If you parse it and put some counts in) the -w
is there to suppress all warnings, which might be awkward to deal with.
From the gcc docs:
-H
Print the name of each header file used, in addition to other normal activities. Each name is indented to show how deep in the
#include
stack it is. Precompiled header files are also printed, even if they are found to be invalid; an invalid precompiled header file is printed with...x
and a valid one with...!
.
The output looks like this:
. /usr/include/unistd.h
.. /usr/include/features.h
... /usr/include/bits/predefs.h
... /usr/include/sys/cdefs.h
.... /usr/include/bits/wordsize.h
... /usr/include/gnu/stubs.h
.... /usr/include/bits/wordsize.h
.... /usr/include/gnu/stubs-64.h
.. /usr/include/bits/posix_opt.h
.. /usr/include/bits/environments.h
... /usr/include/bits/wordsize.h
.. /usr/include/bits/types.h
... /usr/include/bits/wordsize.h
... /usr/include/bits/typesizes.h
.. /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.. /usr/include/bits/confname.h
.. /usr/include/getopt.h
. /usr/include/stdio.h
.. /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stdarg.h
.. /usr/include/bits/stdio_lim.h
.. /usr/include/bits/sys_errlist.h
Multiple include guards may be useful for:
/usr/include/bits/confname.h
/usr/include/bits/environments.h
/usr/include/bits/predefs.h
/usr/include/bits/stdio_lim.h
/usr/include/bits/sys_errlist.h
/usr/include/bits/typesizes.h
/usr/include/gnu/stubs-64.h
/usr/include/gnu/stubs.h
/usr/include/wchar.h
If you are using gcc/g++, the -M
or -MM
option will output a line with the information you seek. (The former will include system headers while the latter will not. There are other variants; see the manual.)
$ gcc -M -c foo.c
foo.o: foo.c /usr/include/stdint.h /usr/include/features.h \
/usr/include/sys/cdefs.h /usr/include/bits/wordsize.h \
/usr/include/gnu/stubs.h /usr/include/gnu/stubs-64.h \
/usr/include/bits/wchar.h
You would need to remove the foo.o: foo.c
at the beginning, but the rest is a list of all headers that the file depends on, so it would not be too hard to write a script to gather these and summarize them.
Of course this suggestion is only useful on Unix and only if nobody else has a better idea. :-)
a few things-
use "preprocess only" to look at your preprocessor output. gcc -E option, other compilers have the function too
use precompiled headers.
gcc has -verbose and --trace options which also display the full include tree, MSVC has the /showIncludes option found under Advanced C++ property page
Also, Displaying the #include hierarchy for a C++ file in Visual Studio
"Large Scale C++ Software Design" by John Lakos had tools that extracted the compile-time dependencies among source files.
Unfortunately, their repository on Addison-Wesley's site is gone (along with AW's site itself), but I found a tarball here: http://prdownloads.sourceforge.net/introspector/LSC-rpkg-0.1.tgz?download
I found it useful several jobs ago, and it has the virtue of being free.
BTW, if you haven't read Lakos's book, it sounds like your project would benefit. (The current edition is a bit dated, but I hear that Lakos has another book coming out in 2012.)
GCC has a -M
flag that will output a list of dependencies for a given source file. You could use that information to figure out which of your files have the most dependencies, which files are most depended on, etc.
Check out the man page for more information. There are several variants of -M
.
Personally I don't know if there is a tool that will say "Remove this file". It's really a complex matter that depends on a lot of things. Looking at a tree of include statements is surely going to drive you nuts.... It would drive me crazy, as well as ruin my eyes. There are better ways to do things to reduce your compile times.
- De-inline your class methods.
- After deinlining them, re-examine your include statements and attempt to remove them. Usually helpful to delete them, and start over.
- Prefer to use forward declarations are much as possible. If you de-inline methods in your header files you can do this alot.
- Break up large header files into smaller files. If a class in a file is used more often than most, then put it in a header file all by itself.
- 1000 translational units is not very much actually. We have between 10-20 thousand. :)
- Get Incredibuild if your compile times are still too long.
I heard there are some tools do it, but I don't use them.
I created some tool https://sourceforge.net/p/headerfinder may be this is useful. Unfortunately it is "HOME MADE" tool with following issues,
- Developed in Vb.Net
- Source code need to compiled
- Very slow and consumes memory.
- No help available.
GCC Has a flag (-save-temps) with which you can save intermediate files. This includes .ii files, which are the results of the preprocessor (so before compilation). You can write a script to parse this and determine the weight/cost/size of what is included, as well as the dependency tree.
I wrote a Python script to do just this (publicly available here: https://gitlab.com/p_b_omta/gcc-include-analyzer).
精彩评论