Why is this faster on 64 bit than 32 bit?_问答_开发者

I've been doing some performance testing, mainly so I can understand the difference between iterators and simple for loops. As part of this I created a simple set of tests and was then totally surprised by the results. For some methods, 64 bit was nearly 10 times faster than 32 bit.

What I'm looking for is some explanation for why this is happening.

[The answer below states this is due to 64 bit arithmetic in a 32 bit app. Changing the longs to ints results in good performance on 32 and 64 bit systems.]

Here are the 3 methods in question.

private static long ForSumArray(long[] array)
{
    var result = 0L;
    for (var i = 0L; i < array.LongLength; i++)
    {
        result += array[i];
    }
    return result;
}

private static long ForSumArray2(long[] array)
{
    var length = array.LongLength;
    var result = 0L;
    for (var i = 0L; i < length; i++)
    {
        result += array[i];
    }
    return result;
}

private static long IterSumArray(long[] array)
{
    var result = 0L;
    foreach (var entry in array)
    {
        result += entry;
    }
    return result;
}

I have a simple test harness that tests this

var repeat = 10000;

var arrayLength = 100000;
var array = new long[arrayLength];
for (var i = 0; i < arrayLength; i++)
{
    array[i] = i;
}

Console.WriteLine("For: {0}", AverageRunTime(repeat, () => ForSumArray(array)));

repeat = 100000;
Console.WriteLine("For2: {0}", AverageRunTime(repeat, () => ForSumArray2(array)));
Console.WriteLine("Iter: {0}", AverageRunTime(repeat, () => IterSumArray(array)));

private static TimeSpan AverageRunTime(int count, Action method)
{
    var stopwatch = new Stopwatch();
    stopwatch.Start();
    for (var i = 0; i < count; i++)
    开发者_StackOverflow{
        method();
    }
    stopwatch.Stop();
    var average = stopwatch.Elapsed.Ticks / count;
    return new TimeSpan(average);
}

When I run these, I get the following results:

32 bit:

For: 00:00:00.0006080
For2: 00:00:00.0005694
Iter: 00:00:00.0001717

64 bit

For: 00:00:00.0007421
For2: 00:00:00.0000814
Iter: 00:00:00.0000818

The things I read from this are that using LongLength is slow. If I use array.Length, performance for the first for loop is pretty good in 64 bit, but not 32 bit.

The other thing I read from this is that iterating over an array is as efficient as a for loop, and the code is much cleaner and easier to read!

x64 processors contain 64 bit general purpose registers with which they can calculate operations on 64 bit integers in a single instruction. 32 bit processors does not have that. This is especially relevant to your program as it's heavily using long (64-bit integer) variables.

For instance, in x64 assembly, to add a couple 64 bit integers stored in registers, you can simply do:

; adds rbx to rax
add rax, rbx

To do the same operation on a 32 bit x86 processor, you'll have to use two registers and manually use the carry of the first operation in the second operation:

; adds ecx:ebx to edx:eax
add eax, ebx
adc edx, ecx

More instructions and less registers mean more clock cycles, memory fetches, ... which will ultimately result in reduced performance. The difference is very notable in number crunching applications.

For .NET applications, it seems that the 64-bit JIT compiler performs more aggressive optimizations improving overall performance.

Regarding your point about array iteration, the C# compiler is clever enough to recognize foreach over arrays and treat them specially. The generated code is identical to using a for loop and it's in recommended that you use foreach if you don't need to change the array element in the loop. Besides that, the runtime recognizes the pattern for (int i = 0; i < a.Length; ++i) and omits the bound checks for array accesses inside the loop. This will not happen in the LongLength case and will result in decreased performance (both for 32 bit and 64 bit case); and since you'll be using long variables with LongLength, the 32 bit performance will get degraded even more.

The long datatype is 64-bits and in a 64-bit process, it is processed as a single native-length unit. In a 32-bit process, it is treated as 2 32-bit units. Math, especially on these "split" types will be processor-intensive.

Not sure of "why" but I would make sure to call your "method" at least once outside your timer loop so you're not counting 1st-time jitting. (Since this looks like C# to me).

Oh, that's easy. I assume that you are using x86 technology. What do you need for doing the loops in assembler ?

One index variable i
One result variable result
An long array of results.

So you need three variables. Variable access is fastest if you can store them in registers; if you need to move them in and out to memory, you are losing speed. For 64bit longs you need two registers on 32bit and we have only four registers, so chances are high that all variables cannot be stored in registers, but must be stored in intermediate storage like the stack. This alone will slow down access considerably.

Addition of numbers: Addition must be two times; the first time without carry bit and the second time with carry bit. 64bit can it do in one cycle.

Moving/Loading: For every 1-cycle 64bit var you need two cycles for 32bit to load/unload a long integer into memory.

Every component datatype (datatypes which consists of more bits than register/address bits) will lose considerable speed. The speed gains of an order of magnitude is the reason GPUs still prefer floats (32bit) instead of doubles (64bit).

As others said, doing 64-bit arithmetic on a 32-bit machine is going to take some extra manipulation, more-so if doing multiplication or division.

Back to your concern about iterators vs. simple for loops, iterators can have fairly complex definitions, and they will only be fast if inlining and compiler-optimization is capable of replacing them with the equivalent simple form. It really depends on the type of iterator and the underlying container implementation. The simplest way to tell if it has been optimized reasonably well is to examine the generated assembly code. Another way is to put it in a long-running loop, pause it, and look at the stack to see what it's doing.