开发者

Why is struct slower than float?

开发者 https://www.devze.com 2023-01-19 07:03 出处:网络
If I have array of structs MyStruct[]: 开发者_如何学Go struct MyStruct { float x; float y; } And it\'s slower than if I do float[] -> x = > i; y => i + 1 (so this array is 2x bigger than with struc

If I have array of structs MyStruct[]:

开发者_如何学Go
struct MyStruct 
{
    float x;
    float y;
}

And it's slower than if I do float[] -> x = > i; y => i + 1 (so this array is 2x bigger than with structs).

Time difference for 10,000 items compare each other (two fors inside) : struct 500ms, array with only floats - 78ms

I thought, that struct appears like eg. float, int etc (on heap).


Firstly structs don't necessarily appear on the heap - they can and often do appear on the stack.

Regarding your performance measurements, I think you must have tested it incorrectly. Using this benchmarking code I get almost the same performance results for both types:

TwoFloats[] a = new TwoFloats[10000];
float[] b = new float[20000];

void test1()
{
    int count = 0;
    for (int i = 0; i < 10000; i += 1)
    {
        if (a[i].x < 10) count++;
    }
}

void test2()
{
    int count = 0;
    for (int i = 0; i < 20000; i += 2)
    {
        if (b[i] < 10) count++;
    }
}

Results:

Method   Iterations per second
test1                 55200000
test2                 54800000


You are doing something seriously wrong if you get times like that. Float comparisons are dramatically fast, I clock them at 2 nanoseconds with the loop overhead. Crafting a test like this is tricky because the JIT compiler will optimize stuff away if you don't use the result of the comparison.

The structure is slightly faster, 1.96 nanoseconds vs 2.20 nanoseconds for the float[] on my laptop. That's the way it should be, accessing the Y member of the struct doesn't cost an extra array index.

Test code:

using System;
using System.Diagnostics;

class Program {
    static void Main(string[] args) {
        var test1 = new float[100000000];  // 100 million
        for (int ix = 0; ix < test1.Length; ++ix) test1[ix] = ix;
        var test2 = new Test[test1.Length / 2];
        for (int ix = 0; ix < test2.Length; ++ix) test2[ix].x = test2[ix].y = ix;
        for (int cnt = 0; cnt < 20; ++cnt) {
            var sw1 = Stopwatch.StartNew();
            bool dummy = false;
            for (int ix = 0; ix < test1.Length; ix += 2) {
                dummy ^= test1[ix] >= test1[ix + 1];
            }
            sw1.Stop();
            var sw2 = Stopwatch.StartNew();
            for (int ix = 0; ix < test2.Length; ++ix) {
                dummy ^= test2[ix].x >= test2[ix].y;
            }
            sw2.Stop();
            Console.Write("", dummy);
            Console.WriteLine("{0} {1}", sw1.ElapsedMilliseconds, sw2.ElapsedMilliseconds);
        }
        Console.ReadLine();
    }
    struct Test {
        public float x;
        public float y;
    }
}


I get results that seem to agree with you (and disagree with Mark). I'm curious if I've made a mistake constructing this (albeit crude) benchmark or if there is another factor at play.

Compiled on MS C# targeting .NET 3.5 framework with VS2008. Release mode, no debugger attached.

Here's my test code:

class Program {
    static void Main(string[] args) {
        for (int i = 0; i < 10; i++) {
            RunBench();
        }

        Console.ReadKey();
    }

    static void RunBench() {
        Stopwatch sw = new Stopwatch();

        const int numPoints = 10000;
        const int numFloats = numPoints * 2;
        int numEqs = 0;
        float[] rawFloats = new float[numFloats];
        Vec2[] vecs = new Vec2[numPoints];

        Random rnd = new Random();
        for (int i = 0; i < numPoints; i++) {
            rawFloats[i * 2] = (float) rnd.NextDouble();
            rawFloats[i * 2 + 1] = (float)rnd.NextDouble();
            vecs[i] = new Vec2() { X = rawFloats[i * 2], Y = rawFloats[i * 2 + 1] };
        }

        sw.Start();
        for (int i = 0; i < numFloats; i += 2) {
            for (int j = 0; j < numFloats; j += 2) {
                if (i != j &&
                    Math.Abs(rawFloats[i] - rawFloats[j]) < 0.0001 &&
                    Math.Abs(rawFloats[i + 1] - rawFloats[j + 1]) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);

        numEqs = 0;
        sw.Reset();
        sw.Start();
        for (int i = 0; i < numPoints; i++) {
            for (int j = 0; j < numPoints; j++) {
                if (i != j &&
                    Math.Abs(vecs[i].X - vecs[j].X) < 0.0001 &&
                    Math.Abs(vecs[i].Y - vecs[j].Y) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);
    }
}

struct Vec2 {
    public float X;
    public float Y;
}

Edit: Ah! I wasn't iterating the proper amounts. With the updated code my timings look like I expected:

269 : numEqs = 8
269 : numEqs = 8
270 : numEqs = 2
269 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 2
268 : numEqs = 2
270 : numEqs = 6
270 : numEqs = 6
269 : numEqs = 8
268 : numEqs = 8
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 6
269 : numEqs = 6
268 : numEqs = 2
270 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4


The most likely reason is that the C# runtime optimizer perform a better job when you work with floats that with full structs, probably because optimizer is mapping x and y to registers or likewise changes not done with full struct.

In your particular example there seems not to be any fundamental reason why it couldn't perform as good a job when you use structs (it's hard to be sure without seeing you actual benchmarking code), but it just doesn't. However it would be interesting to compare the performance of the resulting code when compiled with another C# implementations (I'm thinking of mono on Linux).

I tested Ron Warholic benchmark with mono, and results are consistant with Mark's, difference between the two types of access seems to be minimal (version with floats is 1% faster). However I still should do more testing as it is not unexpected that library calls like Math.Abs take a large amount of time and it could hide a real difference.

After removing calls to Math.Abs and just doing tests like rawFloats[i] < rawFloats[j] the structure version becomes marginally faster (about 5%) than the two arrays of floats.


The code below is based on different ways of iteration. On my machine, Test1b takes almost twice as long as Test1a. I wonder if this relates to your issue.

class Program
{
    struct TwoFloats
    {
        public float x;
        public float y;
    }

    static TwoFloats[] a = new TwoFloats[10000];

    static int Test1a()
    {
        int count = 0;
        for (int i = 0; i < 10000; i += 1)
        {
            if (a[i].x < a[i].y) count++;
        }
        return count;
    }

    static int Test1b()
    {
        int count = 0;
        foreach (TwoFloats t in a)
        {
            if (t.x < t.y) count++;
        }
        return count;
    }

    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1a();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
        sw.Reset();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1b();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
    }

}
0

精彩评论

暂无评论...
验证码 换一张
取 消