Is it better/faster to have class variables or local function variables?_问答_开发者

Ok I know the title doesn't fully explain this question. So I'm writing a program that performs a large number of calculations and I'm trying to optimize it so that it won't run quite so slow. I have a function that is a member of a class that gets called around 5 million times. This is the function:

void Poi开发者_JS百科ntCamera::GetRay(float x, float y, Ray& out)
{
    //Find difference between location on view plane and origin and normalize
    float vpPointx = pixelSizex * (x - 0.5f * (float)width);
    float vpPointy = pixelSizey * (((float)height - y) - 0.5f * height);

    //Transform ray to camera's direction
    out.d = u * vpPointx + v * vpPointy - w * lens_distance;
    out.d.Normalize();

    //Set origin to camera location
    out.o = loc;
}

I'm wondering if it is better/faster to declare the variables vpPointx and vpPointy in the class than to declare them each time I call the function. Would this be a good optimization or would it have little effect?

And in general, if there is anything here that could be optimized please let me know.

By limiting the scope of your variables, you are giving more opportunity to the compiler optimiser to rearrange your code and make it run faster. For example, it might keep the values of those variables entirely within CPU registers, which may be an order of magnitude faster than memory access. Also, if those variables were class instance variables, then the compiler would have to generate code to dereference this every time you accessed them, which would very likely be slower than local variable access.

As always, you should measure the performance yourself and try the code both ways (or better, as many ways as you can think of). All optimisation advice is subject to whatever your compiler actually does, which requires experimentation.

Always prefer locals

Anything that is a temporary value should be a local. It's possible that such a value can exist entirely within a register without kicking something else out of cache or requiring a pointless memory store that will use a resource in far shorter supply than CPU cycles.

A dual 3 GHz CPU can execute 6 billion CPU cycles per second. In order to approach that 6 billion figure, typically most ops should involve no memory or cache operations and the results of most cycles must not be needed by the next instruction unless the CPU can find a later instruction that is immediately dispatchable. This all gets quite complicated but 6 billions somethings, including some wait states, will certainly happen each second.

However, that same CPU system is capable of only 10-40 million memory operations per second. The disparity is partly compensated for by the cache systems, although they are still slower than the CPU is, they are limited in size, and they do not cope with writes as well as they do with reads.

The good news is that good software abstractions and software speed optimization both agree in this case. Do not store transient state in an object unless you have a reason to reference it later.

How about precomputing some of those multiplications that never change. For example, w*lens_distance and 0.5*height. Compute them once whenever the variables change, then just use the stored value in this function call.

There is a performance penalty for declaring them in the class. They are accessed, in effect, by using this->field. There will be, at minimum, one memory write to store the result. The function local variables could live in registers for their entire lifetime.

I'm not sure, although my guess is it's better inside the function (since it's just a push on the stack to "declare" the variable, whereas making it part of the class means accessing it from memory using indirection every time you access it). Of course, in reality the compiler probably optimizes all of this into registers anyway.

Which brings me to my point:

You're going about this the wrong way

I don't think that anyone can really tell you what will be faster. It shouldn't matter even if someone does. The only real way to optimize is by measuring.

This usually means one of two things:

One option is to try each way, measure the time it takes, and compare. Note that this isn't always trivial to do (since each run will sometimes depend on external factors, difficult memory issues, etc). But running the code a few million times will probably iron that out for you.
Ideally, you should be using a profiler. That's a piece of software designed to measure the code for you, and tell you what parts take the longest amount of time. As most people who have dealt with optimization will tell you, you'll usually be surprised at what takes up a lot of time.

That's why you should always go with the "scientific" method of measuring, instead of relying on anyone's guesswork.

Appears to be a raytracer. The little things do add up, but also consider the big hits: You'll get a huge speedup with decent spatial partitioning. Get yourself an octtree or KD-Tree for a few orders magnitude speedup on complex scenes.

As for your direct question: profile it.

The others have already covered the benefits of using locals over class variables, so I won't go into that.. but since you asked for optimization tips in general:

Your int-to-float cast jumps out at me. There's a cost to it, especially if you are using the x387 FPU. Using SSE registers will make it better, but it looks thoroughly unnecessary for your function: you could simply store a copy of them as floats in your class.
You mentioned in a comment that you were still working on your kd-tree. It's probably a better idea to finish that first before doing the low-level optimization; what appears important now may not take up a fraction of the time later.
Use an instruction-level profiler, like VTune. gprof doesn't give you anywhere near enough information.
Have you heard of ompf.org? It's a wonderful raytracing forum, and you can learn a lot about the relevant optimizations there.
See my answer to this post for more tips.
Read Agner Fog.

As an aside: I've heard that the Bounding Interval Hierarchy is much easier to implement. I've not implemented a kd-tree, but I have implemented a BIH, and I'd say it's reasonably straightforward.