开发者

Does using large libraries inherently make slower code?

开发者 https://www.devze.com 2022-12-20 02:24 出处:网络
I have a psychological tic which makes me reluctant to use large libraries (like GLib or Boost) in lower-level languages like C and C++. In my mind, I think:

I have a psychological tic which makes me reluctant to use large libraries (like GLib or Boost) in lower-level languages like C and C++. In my mind, I think:

Well, this library has thousands of man hours put into it, and it's been created by people who know a lot more about the language than I ever will. Their authors and fans say that the libraries are fast and reliable, and the functionality looks really useful, and it will certainly stop me from (badly) reinventing wheels.

But damn it, I'm never going to use every function in that library. It's too big and it's probably become bloated over the years; it's another ball and chain my program needs to drag around.

开发者_如何学CThe Torvalds rant (controversial though it is) doesn't exactly put my heart at ease either.

Is there any basis to my thinking, or am I merely unreasonable and/or ignorant? Even if I only use one or two features of a large library, by linking to that library am I going to incur runtime performance overheads?

I'm sure it depends too on what the specific library is, but I'm generally interested in knowing whether large libraries will, at a technical level, inherently introduce inefficiencies.

I'm tired of obsessing and muttering and worrying about this, when I don't have the technical knowledge to know if I'm right or not.

Please put me out of my misery!


Even if I only use one or two features of a large library, by linking to that library am I going to incur runtime performance overheads?

In general, no.

If the library in question doesn't have a lot of position-independent code, then there will be a start-up cost while the dynamic linker performs relocations on the library when it's requested. Usually, that's part of the program's start-up. There is no run-time performance effect beyond that.

Linkers are also good at removing "dead code" from statically-linked libraries at build time, so any static libraries you use will have minimal size overhead. Performance doesn't even enter into it.

Frankly, you're worrying about the wrong things.


I can't comment on GLib, but keep in mind that a lot of the code in Boost is header-only and given the C++ principle of the user only paying for what they're using, the libraries are pretty efficient. There are several libraries that require you to link against them (regex, filesystem come to mind) but they're separate libraries. With Boost you do not link against a large monolithic library but only against the smaller components that you do use.

Of course, the other question is - what is the alternative? Do you want to implement the functionality that is in Boost yourself when you need it? Given that a lot of very competent people have worked on this code and ensured that it works across a multitude of compilers and still is efficient, this might not exactly be a simple undertaking. Plus you're reinventing the wheel, at least to a certain extent. IMHO you can spend this time more productively.


Boost isn't a big library.

It is a collection of many small libraries. Most of them are so small they're contained in a header or two. Using boost::noncopyable doesn't drag boost::regex or boost::thread into your code. They're different libraries. They're just distributed as part of the same library collection. But you only pay for the ones you use.

But speaking generally, because big libraries do exist, even if Boost isn't one of them:

Is there any basis to my thinking, or am I merely unreasonable and/or ignorant? Even if I only use one or two features of a large library, by linking to that library am I going to incur runtime performance overheads?

No basis, more or less. You can test it yourself.

Write a small C++ program and compile it. Now add a new function to it, one which is never called, but is defined. Compile the program again. Assuming optimizations are enabled, it gets stripped out by the linker because it's unused. So the cost of including additional unused code is zero.

Of course there are exceptions. If the code instantiates any global objects, those might not be removed (that's why including the iostream header increases the executable size), but in general, you can include as many headers and link to as many libraries as you like, and it won't affect the size, performance or memory usage of your program *as long as you don't use any of the added code.

Another exception is that if you dynamically link to a .dll or .so, the entire library must be distributed, and so it can't be stripped of unused code. But libraries that are statically compiled into your executable (either as static libraries (.lib or .a) or just as included header files can usually be trimmed down by the linker, removing unused symbols.


Large library will, from the code performance perspective:

  • occupy more memory, if it has a runtime binary (most parts of boost don't require runtime binaries, they're "header-only"). While the OS will load only the actually used parts of the library to RAM, it still can load more than you need, because the granularity of what's loaded is equal to page size (4 Kb only on my system, though).
  • take more time to load by dynamic linker, if, again, it needs runtime binaries. Each time your program is loaded, dynamic linker has to match each function you need external library to contain with its actual address in memory. It takes some time, but just a little (however, it matters at a scale of loading many programs, such as startup of desktop environment, but you don't have a choice there).

    And yes, it will take one extra jump and a couple of pointer adjustments at runtime each time you call external function of a shared (dynamically linked) library

from a developer's performance perspective:

  • add an external dependency. You will be depending on someone else. Even if that library's free software, you'll need extra expense to modify it. Some developers of veeery low-level programs (I'm talking about OS kernels) hate to rely on anyone--that's their professional perk. Thus the rants.

    However, that can be considered a benefit. If other people are gotten used to boost, they will find familiar concepts and terms in your program and will be more effective understanding and modifying it.

  • Bigger libraries usually contain library-specific concepts that take time to understand. Consider Qt. It contains signals and slots and moc-related infrastructure. Compared to the size of the whole Qt, learning them takes a small fraction of time. But if you use a small part of such a big library, that can be an issue.


Excess code doesn't magically make the processor run slower. All it does is sit there occupying a little bit of memory.

If you're statically linking and your linker is at all reasonable, then it will only include the functions that you actually use anyway.


The term I like for frameworks, library sets, and some types of development tools, is platform technologies. Platform technologies have costs beyond impact on code size and performance.

  1. If your project is itself intended to be used as a library or framework, you may end up pushing your platform technology choices on developers that use your library.

  2. If you distribute your project in source form, you may end up pushing platform technology choices on your end users.

  3. If you do not statically link all your chosen frameworks and libraries, you may end up burdening your end users with library versioning issues.

  4. Compile time effects developer productivity. Incremental linking, precompiled headers, proper header dependency management, etc., can help manage compile times, but do not eliminate the compiler performance problems associated with the massive amounts of inline code some platform technologies introduce.

  5. For projects that are distributed as source, compile time affects the end users of the project.

  6. Many platform technologies have their own development environment requirements. These requirements can accumulate making it difficult and time consuming for new developers on a project to be able to replicate the environment needed to allow compiling and debugging.

  7. Using some platform technologies in effect creates a new programming language for the project. This makes it harder for new developers to contribute.

All projects have platform technology dependencies, but for many projects there are real benefits to keeping these dependencies to a minimum.


There may be a small overhead when loading these libraries if they're dynamically linked. This will typically be a tiny, tiny fraction of the time your program spends running.

However there will be no overhead once everything is loaded.

If you don't want to use all of boost, then don't. It's modular, so you can use the parts you want and ignore the rest.


Bigger doesn't inherently imply slower. Contrary to some of the other answers, there's no inherent difference between libraries stored entirely in headers and libraries stored in object files either.

Header-only libraries can have an indirect advantage. Most template-based libraries have to be header-only (or a lot of the code ends up in headers anyway), and templates do give a lot of opportunities for optimization. Taking code in a typical object-file library and moving it all into headers will not, however, usually have many good effects (and could lead to code bloat).

The real answer for a particular library will usually depend on its overall structure. It's easy to think of "Boost" as something huge. In fact, it's a huge collection of libraries, most of which are individually quite small. You can't say very much (meaningfully) about Boost as a whole, because the individual libraries are written by different people, with different techniques, goals, etc. A few of them (e.g. Format, Assign) really are slower than almost anything you'd be very likely to do on your own. Others (e.g. Pool) provide things you could do yourself, but probably won't, to get at least minor speed improvements. A few (e.g. uBlas) use heavy-duty template magic to run faster than any but a tiny percentage of us can hope to achieve on our own.

There are, of course, quite a few libraries that really are individually large libraries. In quite a few cases, these really are slower than what you'd write yourself. In particular, many (most?) of them attempt to be much more general than almost anything you'd be at all likely to write on your own. While that doesn't necessarily lead to slower code, there's definitely a strong tendency in that direction. Like with a lot of other code, when you're developing libraries commercially, customers tend to be a lot more interested in features than things like size of speed.

Some libraries also devote a lot of space, code (and often at least bits of time) to solving problems you may very well not care about at all. Just for example, years ago I used an image processing library. Its support for 200+ image formats sounded really impressive (and in a way it really was) but I'm pretty sure I never used it to deal with more than about a dozen formats (and I could probably have gotten by supporting only half that many). OTOH, even with all that it was still pretty fast. Supporting fewer markets might have restricted their market to the point that the code would actually have been slower (just for example, it handled JPEGs faster than IJG).


As others have said, there is some overhead when adding a dynamic library. When the library is first loaded, relocations must be performed, although this should be a minor cost if the library is compiled correctly. The cost of looking up individual symbols is also increased since the number of libraries that need to be searched is increased.

The cost in memory of adding another dynamic library depends largely on how much of it you actually use. A page of code will not be loaded from disk until something on it is executed. However, other data such as headers, symbol tables, and hash tables built into the library file will be loaded, and these are generally proportional to the size of the library.

There is a great document by Ulrich Drepper, the lead contributor to glibc, that describes the process and the overhead of dynamic libraries.


Depends on how the linker works. Some linkers are lazy and will include all the code in library. The more efficient linkers will only extract the needed code from a library. I have had experience with both types.

Smaller libraries will have less worries with either type of linker. Worst case with a small library is small amounts of unused code. Many small libraries may increase the build time. The trade off would be build time vs. code space.

An interesting test of the linker is the classic Hello World program:

#include <stdio>
#include <stdlib>
int main(void)
{
  printf("Hello World\n");
  return EXIT_SUCCESS;
}

The printf function has a lot of dependencies due to all the formatting that it may need. A lazy, but fast linker may include a "standard library" to resolve all the symbols. A more efficient library will only include printf and its dependencies. This makes the linker slower.

The above program can be compared to this one using puts:

#include <stdio>
#include <stdlib>
int main(void)
{
  puts("Hello World\n");
  return EXIT_SUCCESS;
}

Generally, the puts version should be smaller than the printf version, because puts has no formatting needs thus less dependencies. Lazy linkers will generate the same code size as the printf program.

In summary, library size decisions have more dependencies on the linker. Specifically, the efficiency of the linker. When in doubt, many small libraries will rely less on the efficiency of the linker, but make the build process more complicated and slower.


  1. The thing to do with performance concerns, in general, is not to entertain them, because to do so is to be guessing that they are a problem, because if you don't know they are, you are guessing, and guessing is the central concept behind "premature optimization". The thing to do with performance problems is, when you have them, and not before, diagnose them. The problems are almost never something you would have guessed. Here's an extended example.

  2. If you do that a fair amount, you will come to recognize the design approaches that tend to cause performance problems, whether in your code or in a library. (Libraries can certainly have performance problems.) When you learn that and apply it to projects then in a sense you are prematurely optimizing, but it has the desired effect anyway, of avoiding problems. If I can summarize what you will probably learn, it is that too many layers of abstraction, and overblown class hierarchies (especially those full of notification-style updating) are what are very often the reasons for performance problems.

At the same time, I share your circumspection about 3rd-party libraries and such. Too many times I have worked on projects where some 3rd-party package was "leveraged" for "synergy", and then the vendor either went up in smoke or abandoned the product or had it go obsolete because Microsoft changed things in the OS. Then our product that leaned heavily on the 3rd-party package starts not working, requiring a big expenditure on our part while the original programmers are long gone.


"another ball and chain". Really?

Or is it a stable, reliable platform that enables your application in the first place?

Consider that some folks may like a "too big and ... bloated" library because they use it for other projects and really trust it.

Indeed, they may decline to mess with your software specifically because you avoided using the obvious "too big and ... bloated" library.


Technically, the answer is that yes, they do. However, these inefficiencies are very seldom practically important. I'm going to assume a statically compiled language like C, C++, or D here.

When an executable is loaded into memory on a modern OS, address space is simply mapped to it. This means that, no matter how big the exectable is, if there are entire page-size blocks of code that aren't used, they will never touch physical memory. You will waste address space, though, and occasionally this can matter a little on 32-bit systems.

When you link to a library, a good linker will usually throw out excess stuff that you don't use, though especially in the case of template instantiations this doesn't always happen. Thus your binaries might be a little bit bigger than strictly necessary.

If you have code that you don't use heavily interleaved with code that you do use, you can end up wasting space in your CPU cache. However, as cache lines are small (usually 64 bytes), this will seldom happen to a practically important extent.


Ask yourself what your target is. Is it a mid end workstation of today - no problem. Is it older hardware or even a limited embedded system, then it might be.

As previous posters have said, just having the code there does not cost you much in performance (it might reduce the locality for the caches and increase loading times).


fwiw, I work on Microsoft Windows and when we build Windows; build compiled for SIZE are faster than builds compiled for SPEED because you take fewer page fault hits.


FFTW and ATLAS are two quite large libraries. Oddly enough, they play large roles in the fastest software in the world, applications optimized to run on supercomputers. No, using large libraries doesn't make your code slow, especially when the alternative is implementing FFT or BLAS routines for yourself.


You are very right to be worried, especially when it comes to boost. It's not so much due to anyone writing them being incompetent but due to two issues.

  1. Templates are just inherently bloated code. This didn't matter as much 10 years ago, but nowadays the CPU is much faster than memory access and this trend continues. I'd almost say templates are an obsolescent feature.

It's not so bad for user code which is usually somewhat practical, but in many libraries everything is defined in terms of other templates or template on on multiple items (meaning exponential template code explosions).

Simply adding in iostream adds about 3 mb (!!!) to your code. Now add in some boost nonsense and you have 30 mb of code if you sinply declare a couple of particularly weird data structures.

Worse, you can't even easily profile this. I can tell you the difference between code written by me and code from template libraries is DRAMATIC but for a more naieve approach you may think you are doing worse from a simple test, but the cost in code bloat will take its tool in a large realworld app.

  1. Complexity. When you look at the things in Boost, they are all things that complicate your code to a huge degree. Things like smart pointers, functors, all sorts of complicated stuff. Now, I won't say it's never a good idea to use this stuff, but pretty much all of it has a big cost of some kind. Especially if you don't understand exactly, I mean exactly, what it's doing.

But people rave about it and pretend it has something to do with 'design' so people get the impression it is the way you should do everything, not just some extremely specialized tools that should be used seldom. If ever.

0

精彩评论

暂无评论...
验证码 换一张
取 消