开发者

Java: Multi-dimensional array vs. one-dimensional

开发者 https://www.devze.com 2022-12-24 05:49 出处:网络
For example: a) int [x][y][z] vs b) int[x*y*z] Initially thought I\'d go with a) for simplicity. I know that Java doesn\'t store arrays li开发者_如何学编程nearly in memory like C does, but what

For example:

  • a) int [x][y][z]

    vs

  • b) int[x*y*z]

Initially thought I'd go with a) for simplicity.

I know that Java doesn't store arrays li开发者_如何学编程nearly in memory like C does, but what implications does this have for my program?


Usually the best thing to do when searching anwers for such questions is to see how the choices are compiled into JVM bytecode:

multi = new int[50][50];
single = new int[2500];

This is translated into:

BIPUSH 50
BIPUSH 50
MULTIANEWARRAY int[][] 2
ASTORE 1
SIPUSH 2500
NEWARRAY T_INT
ASTORE 2

So, as you can see, the JVM already knows that we are talking about a multi dimensional array.

Keeping it further:

for (int i = 0; i < 50; ++i)
    for (int j = 0; j < 50; ++j)
    {
        multi[i][j] = 20;
        single[i*50+j] = 20;
    }

This is translated (skipping the cycles) into:

ALOAD 1: multi
ILOAD 3: i
AALOAD
ILOAD 4: j
BIPUSH 20
IASTORE

ALOAD 2: single
ILOAD 3: i
BIPUSH 50
IMUL
ILOAD 4: j
IADD
BIPUSH 20
IASTORE

So, as you can see, the multi-dimensional array is treated internally in the VM, no overhead generated by useless instructions, while using a single one uses more instructions since offset is calculated by hand.

I don't think that performance will be such an issue.

EDIT:

I did some simple benchmarks to see what's going down here. I chose to try different examples: linear read, linear write, and random access. Times are expressed in millisecs (and calculated using System.nanoTime(). Here are the results:

Linear write

  • Size: 100x100 (10000)
    • Multi: 5.786591
    • Single: 6.131748
  • Size: 200x200 (40000)
    • Multi: 1.216366
    • Single: 0.782041
  • Size: 500x500 (250000)
    • Multi: 7.177029
    • Single: 3.667017
  • Size: 1000x1000 (1000000)
    • Multi: 30.508131
    • Single: 18.064592
  • Size: 2000x2000 (4000000)
    • Multi: 185.3548
    • Single: 155.590313
  • Size: 5000x5000 (25000000)
    • Multi: 955.5299
    • Single: 923.264417
  • Size: 10000x10000 (100000000)
    • Multi: 4084.798753
    • Single: 4015.448829

Linear read

  • Size: 100x100 (10000)
    • Multi: 5.241338
    • Single: 5.135957
  • Size: 200x200 (40000)
    • Multi: 0.080209
    • Single: 0.044371
  • Size: 500x500 (250000)
    • Multi: 0.088742
    • Single: 0.084476
  • Size: 1000x1000 (1000000)
    • Multi: 0.232095
    • Single: 0.167671
  • Size: 2000x2000 (4000000)
    • Multi: 0.481683
    • Single: 0.33321
  • Size: 5000x5000 (25000000)
    • Multi: 1.222339
    • Single: 0.828118
  • Size: 10000x10000 (100000000)
    • Multi: 2.496302
    • Single: 1.650691

Random read

  • Size: 100x100 (10000)
    • Multi: 22.317393
    • Single: 8.546134
  • Size: 200x200 (40000)
    • Multi: 32.287669
    • Single: 11.022383
  • Size: 500x500 (250000)
    • Multi: 189.542751
    • Single: 68.181343
  • Size: 1000x1000 (1000000)
    • Multi: 1124.78609
    • Single: 272.235584
  • Size: 2000x2000 (4000000)
    • Multi: 6814.477101
    • Single: 1091.998395
  • Size: 5000x5000 (25000000)
    • Multi: 50051.306239
    • Single: 7028.422262

The random one is a little misleading since it generates 2 random numbers for multi-dimensional array while just one for single dimensional (and PNRGs may consume some CPU).

Mind that I tried to let JIT work by benchmarking only after the 20th run of the same loop. For completeness my java VM is the following:

java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)


On current CPUs, non-cached memory access is hundreds of times slower than arithmetics (see this presentation and read What every programmer should know about memory). The a) option will result in about 3 memory lookups whereas the b) option will result in about 1 memory lookup. Also the CPU's prefetching algorithms might not work as well. So the b) option can be faster in some situations (it's a hot spot and the array does not fit into the CPU's cache). How much faster? - that will depend on the application.

Personally I would first use the a) option, because it will result in simpler code. If a profiler shows that array access to be a bottleneck, then I would convert it to the b) option, so that there is a pair of helper methods for reading and writing array values (that way the messy code will be restricted to those two methods).

I made a benchmark for comparing 3-dimensional int arrays ("Multi" column) to the equivalent 1-dimensional int arrays ("Single" column). The code is here and tests here. I ran it on 64-bit jdk1.6.0_18, Windows 7 x64, Core 2 Quad Q6600 @ 3.0 GHz, 4 GB DDR2, using the JVM options -server -Xmx3G -verbose:gc -XX:+PrintCompilation (I have removed the debug output from the following results). The results were:

Out of 20 repeats, the minimum time in milliseconds is reported.

Array dimensions: 100x100x100 (1000000)
            Multi   Single
Seq Write   1       1
Seq Read    1       1
Random Read 99      90    (of which generating random numbers 59 ms)

Array dimensions: 200x200x200 (8000000)
            Multi   Single
Seq Write   14      13
Seq Read    11      8
Random Read 1482    1239    (of which generating random numbers 474 ms)

Array dimensions: 300x300x300 (27000000)
            Multi   Single
Seq Write   53      46
Seq Read    34      24
Random Read 5915    4418    (of which generating random numbers 1557 ms)

Array dimensions: 400x400x400 (64000000)
            Multi   Single
Seq Write   123     111
Seq Read    71      55
Random Read 16326   11144    (of which generating random numbers 3693 ms)

This shows the 1-dimensional array to be faster. Though the differences are so small, that for 99% applications it won't be noticable.

I did also some measurements to estimate the overhead of generating the random numbers in the Random Read benchmark by replacing preventOptimizingAway += array.get(x, y, z); with preventOptimizingAway += x * y * z; and added the measurements to the above results table by hand. Generating the random numbers takes 1/3 or less of the total time of the Random Read benchmark, so the memory access dominates the benchmark as expected. It would be interesting to repeat this benchmark with arrays of 4 and more dimensions. Probably it would make the speed difference bigger, because the multidimensional array's topmost levels will fit into the CPU's cache, and only the other levels will require a memory lookup.


Use first variant (3-dimensional) because it's easier for understanding and there are less chances to make some logical error (especially if you're using it for modeling 3-dimensional space)


If you choose the latter route then you're going to have to perform arithmetic for every single array access. That's going to be a pain and error prone (unless you wrap it in a class providing this functionality).

I don't believe that there's any (significant) optimisation in choosing your flat array (especially given the arithmetic taken to index into it). As always with optimisations, you would need to perform some measurements and determine if it's really worth it.

0

精彩评论

暂无评论...
验证码 换一张
取 消