Virtual Function Implementation_问答_开发者_运维开发者技术经验分享

I have kept hearing this statement. Switch..Case is Evil for code maintenance, but it provides better performance(since compiler can inline stuffs etc..). Virtual functions are very good for code maintenance, but they incur a performance penalty of two pointer indirections.

Say i have a base class with 2 subclasses(X and Y) and one virtual function, so there will be two virtual tables. The object has a pointer, based on which it will choose a virtual table. So for the compiler, it is more like

switch( object's function ptr )
{

   case 0x....:

       X->call();

       break;

   case 0x....:

       Y->call();
};

So why should virtual function cost more, if it can get implemented this way, as the compiler can do the same 开发者_Go百科in-lining and other stuff here. Or explain me, why is it decided not to implement the virtual function execution in this way?

Thanks, Gokul.

The compiler can't do that because of the separate compilation model.

At the time the virtual function call is being compiled, there is no way for the compiler to know for sure how many different subclasses there are.

Consider this code:

// base.h
class base
{
public:
    virtual void doit();
};

and this:

// usebase.cpp
#include "base.h"

void foo(base &b)
{
    b.doit();
}

When the compiler is generating the virtual call in foo, it has no knowledge of which subclasses of base will exist at runtime.

Your question rests on misunderstandings about the way switches and virtual functions work. Rather than fill up this box with a long treatise on code generation, I'll give a few bullet points:

Switch statements aren't necessarily faster than virtual function calls, or inlined. You can learn more about the way that switch statements are turned into assembly here and here.
The thing that is slow about virtual function calls isn't the pointer lookups, it's the indirect branch. For complicated reasons having to do with the internal electronics of the CPU, for most modern processors it is faster to perform a "direct branch", where the destination address is encoded in the instruction, than an "indirect branch", where the address is computed at runtime. Virtual function calls and large switch statements are usually implemented as indirect branches.
In your example above, the switch is completely redundant. Once an object's member function pointer has been computed, the CPU can branch straight to it. Even if the linker was aware of every possible member object that existed in the executable, it would still be unnecessary to add that table lookup.

Here's some results from concrete tests. These particular results are from VC++ 9.0/x64:

Test Description: Time to test a global using a 10-way if/else if statement
CPU Time:        7.70  nanoseconds           plus or minus      0.385

Test Description: Time to test a global using a 10-way switch statement
CPU Time:        2.00  nanoseconds           plus or minus     0.0999

Test Description: Time to test a global using a 10-way sparse switch statement
CPU Time:        3.41  nanoseconds           plus or minus      0.171

Test Description: Time to test a global using a 10-way virtual function class
CPU Time:        2.20  nanoseconds           plus or minus      0.110

With sparse cases, the switch statement is substantially slower. With dense cases, the switch statement might be faster, but the switch and the virtual function dispatch overlap a bit, so while the switch is probably faster, the margin is so small we can't even be sure it is faster, not to mention being enough faster to care much about. If the cases in the switch statement are sparse at all, there's no real question that the virtual function call will be faster.

There is no branching in virtual dispatch. The vptr in your class points to a vtable, with a second pointer for the specific function at a constant offset.

Actually, if you'll have many virtual functions, switch-like branching will be slower than two pointer indirection. Performance of the current implementation doesn't depend on how many virtual functions you have.

Your statement about the branching when calling virtual function is wrong. There is not such thing in generated code. Take a look at the assembly code will give you a better idea.

In a nut shell, one general simplified implementation of C++ virtual function is: each class will have a virtual table (vbtl), and each instance of the class will have a virtual table pointer (vptr). The virtual table is basically a list of function pointers.

When you are calling a virtual function, say it is like:

class Base {};
class Derived {};
Base* pB = new Derived();
pB->someVirtualFunction();

The 'someVirtualFunction()' will have a corresponding index in the vtbl. And the call

pB->someVirtualFunction();

will be converted to something like:

pB->vptr[k](); //k is the index of the 'someVirtualFunction'.

In this way the function is actually called indirectly and it has the polymorphism.

I suggest you to read 'The C++ Object Model' by Stanley Lippman.

Also, the statement that virtual function call is slower than the switch-case is not accruate. It depends. As you can see above, a virtual function call is just 1 extra time of dereference compared to a regular function call. And with switch-case branching you'd have extra comparision logic (which introduce the chance of missing cache for CPU) which also consumes CPU cycles. I would say in most case, if not all, virutal function call should be faster than switch-case.

Saying definitively that switch/case is more or less performant than virtual calls is an over-generalization. The truth is that this will depend on many things, and will vary based on:

what compiler you are using
what optimizations are enabled
the overall characteristics of your program and how they impact those optimizations

If you are optimizing the code in your head as you write it, there is a good chance that you are making the wrong choice. Write the code in a human-readable and/or user-friendly way first, then run the entire executable through profiling tools. If this area of the code shows up as a hotspot, then try it both ways and see which is quantifiably better for your particular case.

These kind of optimizations are possible only by a repatching linker which should run as a part of C++ runtime.

The C++ runtime is more complex that even a new DLL load (with COM) will add new function pointers to the vtable. (think about pure virtual fns?)

Then compiler or linker both cannot do this optimization. switch/case is obviously faster than indirect call since prefetch in CPU is deterministic and pipelining is possible. But it will not work out in C++ because of this runtime extension of object's vtable.