开发者

How to memset() memory to a certain pattern instead of a single byte?

开发者 https://www.devze.com 2023-01-08 21:09 出处:网络
I need to write a repeating pattern to memory (e.g. 0x11223344), so that the whole memory looks like (in hex):

I need to write a repeating pattern to memory (e.g. 0x11223344), so that the whole memory looks like (in hex):

1122334411223344112233441122334411223344112233441122334411223344...

I can't figure out 开发者_C百科how to do it with memset() because it takes only a single byte, not 4 bytes.

Any ideas?


On OS X, one uses memset_pattern4( ) for this; I would expect other platforms to have similar APIs.

I don't know of a simple portable solution, other than just filling in the buffer with a loop (which is pretty darn simple).


Recursively copy the memory, using the area which you already filled as a template per iteration O(log(N)):

int fillLen = ...;
int blockSize = 4; // Size of your pattern

memmove(dest, srcPattern, blockSize);
char * start = dest;
char * current = dest + blockSize;
char * end = start + fillLen;
while(current + blockSize < end) {
    memmove(current, start, blockSize);
    current += blockSize;
    blockSize *= 2;
}
// fill the rest
memmove(current, start, (int)end-current);

What I mean with O(log(N)) is that the runtime will be much faster than if you fill the memory manually since memmove() usually uses special, hand-optimized assembler loops that are blazing fast.


An efficient way would be to cast the pointer to a pointer of the needed size in bytes (e.g. uint32_t for 4 bytes) and fill with integers. It's a little ugly though.

char buf[256] = { 0, };
uint32_t * p = (uint32_t *) buf, i;

for (i = 0; i < sizeof(buf) / sizeof(* p); i++) {
    p[i] = 0x11223344;
}

Not tested!


If your pattern fits in a wchar_t, you can use wmemset() as you would have used memset().


Well, the normal method of doing that is to manually setup the first four bytes, and then memcpy(ptr+4, ptr, len -4)

This copies the first four bytes into the second four bytes, then copies the second four bytes into the third, and so on.

Note, that this "usually" works, but is not guarenteed to, depending on your CPU architecture, and your C run-time library.


You could set up the sequence somewhere then copy it using memcpy() to where you need it.


Standard C library has no such function. But memset is usually implemented as an unrolled loop to minimize branching and condition checking:

static INLINE void memset4(uint32_t *RESTRICT p, uint32_t val, int len) {
  uint32_t *end = p + (len&~0x1f); //round down to nearest multiple of 32
  while (p != end) { //copy 32 times
    p[ 0] = val;
    p[ 1] = val;
    p[ 2] = val;
    p[ 3] = val;
    p[ 4] = val;
    p[ 5] = val;
    p[ 6] = val;
    p[ 7] = val;
    p[ 8] = val;
    p[ 9] = val;
    p[10] = val;
    p[11] = val;
    p[12] = val;
    p[13] = val;
    p[14] = val;
    p[15] = val;
    p[16] = val;
    p[17] = val;
    p[18] = val;
    p[19] = val;
    p[20] = val;
    p[21] = val;
    p[22] = val;
    p[23] = val;
    p[24] = val;
    p[25] = val;
    p[26] = val;
    p[27] = val;
    p[28] = val;
    p[29] = val;
    p[30] = val;
    p[31] = val;
    p += 32;
  }
  end += len&0x1f; //remained
  while (p != end) *p++ = val; //copy remaining bytes
}

Good compiler will likely use some CPU specific instructions to optimize it further (like i.e. use SSE 128-bit store), but even without optimizations, it should be as fast as a library memset, because such simple loops are memory access bound.


I was thinking about this today when I had to duplicate a complex scalar across a memory aligned array in order to use Volk to perform SIMD multiplication. I see the solutions above but I don't know enough about compilers to say what will and won't be optimized. I plan to benchmark a few of these suggestions, but the solution that occurred to me is:

inline void duplicate_32fc(lv_32fc_t *out, lv_32fc_t in, int size) {

    int n = 1;
    int last_n;

    if (n < 1)
        return;

    //Copy the first one
    out[0] = in;

    //Double the size of the copy for each copy
    while (n*2 <= size) {
        memcpy(&out[n], out, n * sizeof(lv_32fc_t));
        last_n = n;
        n = n * 2;
    }

    //Copy the tail
    if (last_n < size) {
        memcpy(&out[last_n], out, (size - last_n) * sizeof(lv_32fc_t));
    }
}

Each iteration copies all of the previous copies to the new space so I think it is O(log(n)), no?

0

精彩评论

暂无评论...
验证码 换一张
取 消