开发者

Fastest way to zero out a 2d array in C?

开发者 https://www.devze.com 2022-12-24 07:21 出处:网络
I want to repeatedly zero a large 2d array in C. This is what I do at the moment: // Array of size n * m, where n may not equal m

I want to repeatedly zero a large 2d array in C. This is what I do at the moment:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{
    for(i = 0; i < m; i++)
    {  
        array[i][j] = 0;
    }
}

I've tried using memset:

开发者_如何学Go
memset(array, 0, sizeof(array))

But this only works for 1D arrays. When I printf the contents of the 2D array, the first row is zeroes, but then I got a load of random large numbers and it crashes.


memset(array, 0, sizeof(array[0][0]) * m * n);

Where m and n are the width and height of the two-dimensional array (in your example, you have a square two-dimensional array, so m == n).


If array is truly an array, then you can "zero it out" with:

memset(array, 0, sizeof array);

But there are two points you should know:

  • this works only if array is really a "two-d array", i.e., was declared T array[M][N]; for some type T.
  • it works only in the scope where array was declared. If you pass it to a function, then the name array decays to a pointer, and sizeof will not give you the size of the array.

Let's do an experiment:

#include <stdio.h>

void f(int (*arr)[5])
{
    printf("f:    sizeof arr:       %zu\n", sizeof arr);
    printf("f:    sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("f:    sizeof arr[0][0]: %zu\n", sizeof arr[0][0]);
}

int main(void)
{
    int arr[10][5];
    printf("main: sizeof arr:       %zu\n", sizeof arr);
    printf("main: sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("main: sizeof arr[0][0]: %zu\n\n", sizeof arr[0][0]);
    f(arr);
    return 0;
}

On my machine, the above prints:

main: sizeof arr:       200
main: sizeof arr[0]:    20
main: sizeof arr[0][0]: 4

f:    sizeof arr:       8
f:    sizeof arr[0]:    20
f:    sizeof arr[0][0]: 4

Even though arr is an array, it decays to a pointer to its first element when passed to f(), and therefore the sizes printed in f() are "wrong". Also, in f() the size of arr[0] is the size of the array arr[0], which is an "array [5] of int". It is not the size of an int *, because the "decaying" only happens at the first level, and that is why we need to declare f() as taking a pointer to an array of the correct size.

So, as I said, what you were doing originally will work only if the two conditions above are satisfied. If not, you will need to do what others have said:

memset(array, 0, m*n*sizeof array[0][0]);

Finally, memset() and the for loop you posted are not equivalent in the strict sense. There could be (and have been) compilers where "all bits zero" does not equal zero for certain types, such as pointers and floating-point values. I doubt that you need to worry about that though.


Well, the fastest way to do it is to not do it at all.

Sounds odd I know, here's some pseudocode:

int array [][];
bool array_is_empty;


void ClearArray ()
{
   array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

void SetValue (int x, int y, int value)
{
   if (array_is_empty)
   {
      memset (array, 0, number of byte the array uses);
      array_is_empty = false;
   }
   array [x][y] = value;
}

Actually, it's still clearing the array, but only when something is being written to the array. This isn't a big advantage here. However, if the 2D array was implemented using, say, a quad tree (not a dynamic one mind), or a collection of rows of data, then you can localise the effect of the boolean flag, but you'd need more flags. In the quad tree just set the empty flag for the root node, in the array of rows just set the flag for each row.

Which leads to the question "why do you want to repeatedly zero a large 2d array"? What is the array used for? Is there a way to change the code so that the array doesn't need zeroing?

For example, if you had:

clear array
for each set of data
  for each element in data set
    array += element 

that is, use it for an accumulation buffer, then changing it like this would improve the performance no end:

 for set 0 and set 1
   for each element in each set
     array = element1 + element2

 for remaining data sets
   for each element in data set
     array += element 

This doesn't require the array to be cleared but still works. And that will be far faster than clearing the array. Like I said, the fastest way is to not do it in the first place.


If you are really, really obsessed with speed (and not so much with portability) I think the absolute fastest way to do this would be to use SIMD vector intrinsics. e.g. on Intel CPUs, you could use these SSE2 instructions:

__m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.

Each store instruction will set four 32-bit ints to zero in one hit.

p must be 16-byte aligned, but this restriction is also good for speed because it will help the cache. The other restriction is that p must point to an allocation size that is a multiple of 16-bytes, but this is cool too because it allows us to unroll the loop easily.

Have this in a loop, and unroll the loop a few times, and you will have a crazy fast initialiser:

// Assumes int is 32-bits.
const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
const int nr = roundUpToNearestMultiple(n, 4);    

int i = 0;
int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
__m128i* px = (__m128i*)array;
const int incr = s >> 2;                            // Unroll it 4 times.
const __m128i zero128 = _mm_setzero_si128();

for(i = 0; i < s; i += incr)
{
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
}

There is also a variant of _mm_storeu that bypasses the cache (i.e. zeroing the array won't pollute the cache) which could give you some secondary performance benefits in some circumstances.

See here for SSE2 reference: http://msdn.microsoft.com/en-us/library/kcwz153a(v=vs.80).aspx


If you initialize the array with malloc, use calloc instead; it will zero your array for free. (Same perf obviously as memset, just less code for you.)


int array[N][M] = {0};

...at least in GCC 4.8.


How was your 2D array declared?

If it something like:

int arr[20][30];

You can zero it by doing:

memset(arr, sizeof(int)*20*30);


Use calloc instead of malloc . calloc will initiate all fields to 0.

int *a = (int *)calloc(n,size of(int)) ;

//all cells of a have been initialized to 0


I think that the fastest way to do it by hand is following code. You can compare it's speed to memset function, but it shouldn't be slower.

(change type of ptr and ptr1 pointers if your array type is different then int)


#define SIZE_X 100
#define SIZE_Y 100

int *ptr, *ptr1;
ptr = &array[0][0];
ptr1 = ptr + SIZE_X*SIZE_Y*sizeof(array[0][0]);

while(ptr < ptr1)
{
    *ptr++ = 0;
}


memset(array, 0, sizeof(int [n][n]));


You can try this

int array[20,30] = {{0}};


I don't have the reputation to comment, but...

I'd somewhat agree with Skizz's answer that "the fastest way to do it is [often] not to do it" but with a very important caveat. If you're going to filter access like this you really need to consider every cycle wasted.

For example, where Skizz has:

void ClearArray ()
{
    array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

I'd point out that the conditional ?: is an unnecessary branch. To get rid of this you could use an integer array_is_empty with a value of 0 or 1 and then ...

int ReadValue (int x, int y)
{
   return array_is_empty * array [x][y];
}

And this removes the conditional but adds an unnecessary multiplication which the compiler won't optimise away since it won't know if array_is_empty will ever have a value other than 0 or 1

So, instead I suggest using an AND mask of all 1's. This could be a '-1' representation in a signed type, or just a ~0 (Zero, bit-flipped) in any simple type.

int ReadValue (int x, int y)
{
   return array_is_empty & array [x][y];
}

This results in a MUCH cleaner faster solution with no unnecessary branch prediction and no slow multiplication operations. An AND is super fast and this will be important if you're performing this filtering over a large number of lookups into an array.

The final caveat I'd add is considerably more serious:

When most people zero an array, they expect each element to be zero. With a 'single-flag' approach, setting a single element will remove the filter and expose all the uninitialised data... which is almost never what the user expects. This can cause a lot of headaches later.

The alternative (as Skizz shows) is to zero the array on first write. This undoes every potential benefit of the approach and leaves ONLY downsides and performance loss.

So, I'd use this only on arrays that are either initialised in full, or not at all. And, to be honest, that severely limits its usefulness.

Where it can come in useful is where the array is used in blocks, such as lines ... particularly when sparsely occupied. If each line or block has such a flag then it becomes a very efficient approach. A single general 'memset' on first write (see Skizz's example) undoes any potential benefit of the approach and is a big alarm bell that you should just Zero the array on construction ; )

Also, a minor point, but, personally I'd avoid method names like "ClearArray"... I humbly submit that "MarkUninitialised" -or- "MarkUnused" may seem ugly, but avoids confusion.

Still, all credit to Skizz for his answer. When it makes sense to do, it is well worth doing. Unnecessary work avoided is as good as an optimising refactor later ; )

My apologies for making this an answer. I just thought it worth pointing out, given how rapidly additional work during array-access gets out of hand... and that issue with memset-on-first-write


This happens because sizeof(array) gives you the allocation size of the object pointed to by array. (array is just a pointer to the first row of your multidimensional array). However, you allocated j arrays of size i. Consequently, you need to multiply the size of one row, which is returned by sizeof(array) with the number of rows you allocated, e.g.:

bzero(array, sizeof(array) * j);

Also note that sizeof(array) will only work for statically allocated arrays. For a dynamically allocated array you would write

size_t arrayByteSize = sizeof(int) * i * j; 
int *array = malloc(array2dByteSite);
bzero(array, arrayByteSize);
0

精彩评论

暂无评论...
验证码 换一张
取 消