开发者

zeroing out memory

开发者 https://www.devze.com 2022-12-30 23:49 出处:网络
gcc 4.4.4 C89 I am just wondering what most C programmers do when they want to zero out memory. For example, I have a buffer of 1024 bytes. Sometimes I do this:

gcc 4.4.4 C89

I am just wondering what most C programmers do when they want to zero out memory.

For example, I have a buffer of 1024 bytes. Sometimes I do this:

char buffer[1024] = {0};

Which will zero all bytes.

However, should I declare it like this and use memset?

char buffer[1024];
.
.
memset(buffer, 0, sizeof开发者_StackOverflow(buffer));

Is there any real reason you have to zero the memory? What is the worst that can happen by not doing it?


The worst that can happen? You end up (unwittingly) with a string that is not NULL terminated, or an integer that inherits whatever happened to be to the right of it after you printed to part of the buffer. Yet, unterminated strings can happen other ways, too, even if you initialized the buffer.

Edit (from comments) The end of the world is also a remote possibility, depending on what you are doing.

Either is undesirable. However, unless completely eschewing dynamically allocated memory, most statically allocated buffers are typically rather small, which makes memset() relatively cheap. In fact, much cheaper than most calls to calloc() for dynamic blocks, which tend to be bigger than ~2k.

c99 contains language regarding default initialization values, I can't, however, seem to make gcc -std=c99 agree with that, using any kind of storage.

Still, with a lot of older compilers (and compilers that aren't quite c99) still in use, I prefer to just use memset()


I vastly prefer

char buffer[1024] = { 0 };

It's shorter, easier to read, and less error-prone. Only use memset on dynamically-allocated buffers, and then prefer calloc.


When you define char buffer[1024] without initializing, you're going to get undefined data in it. For instance, Visual C++ in debug mode will initialize with 0xcd. In Release mode, it will simply allocate the memory and not care what happens to be in that block from previous use.

Also, your examples demonstrate runtime vs. compile time initialization. If your char buffer[1024] = { 0 } is a global or static declaration, it will be stored in the binary's data segment with its initialized data, thus increasing your binary size by about 1024 bytes (in this case). If the definition is in a function, it's stored on the stack and is allocated at runtime and not stored in the binary. If you provide an initializer in this case, the initializer is stored in the binary and an equivalent of a memcpy() is done to initialize buffer at runtime.

Hopefully, this helps you decide which method works best for you.


In this particular case, there's not much difference. I prefer = { 0 } over memset because memset is more error-prone:

  • It provides an opportunity to get the bounds wrong.
  • It provides an opportunity to mix up the arguments to memset (e.g. memset(buf, sizeof buf, 0) instead of memset(buf, 0, sizeof buf).

In general, = { 0 } is better for initializing structs too. It effectively initializes all members as if you had written = 0 to initialize each. This means that pointer members are guaranteed to be initialized to the null pointer (which might not be all-bits-zero, and all-bits-zero is what you'd get if you had used memset).

On the other hand, = { 0 } can leave padding bits in a struct as garbage, so it might not be appropriate if you plan to use memcmp to compare them later.


The worst that can happen by not doing it is that you write some data in character by character and later interpret it as a string (and you didn't write a null terminator). Or you end up failing to realise a section of it was uninitialised and read it as though it were valid data. Basically: all sorts of nastiness.

Memset should be fine (provided you correct the sizeof typo :-)). I prefer that to your first example because I think it's clearer.

For dynamically allocated memory, I use calloc rather than malloc and memset.


One of the things that can happen if you don't initialize is that you run the risk of leaking sensitive information.

Uninitialized memory may have something sensitive in it from a previous use of that memory. Maybe a password or crypto key or part of a private email. Your code may later transmit that buffer or struct somewhere, or write it to disk, and if you only partially filled it the rest of it still contains those previous contents. Certain secure systems require zeroizing buffers when an address space can contain sensitive information.


I prefer using memset to clear a chunk of memory, especially when working with strings. I want to know without a doubt that there will be a null delimiter after my string. Yes, I know you can append a \0 on the end of each string and some functions do this for you, but I want no doubt that this has taken place.

A function could fail when using your buffer, and the buffer remains unchanged. Would you rather have a buffer of unknown garbage, or nothing?


This post has been heavily edited to make it correct. Many thanks to Tyler McHenery for pointing out what I missed.

char buffer[1024] = {0};

Will set the first char in the buffer to null, and the compiler will then expand all non-initialized chars to 0 too. In such a case it seems that the differences between the two techniques boil down to whether the compiler generates more optimized code for array initialization or whether memset is optimized faster than the generated compiled code.

Previously I stated:

char buffer[1024] = {0};

Will set the first char in the buffer to null. That technique is commonly used for null terminated strings, as all data past the first null is ignored by subsequent (non-buggy) functions that handle null terminated strings.

Which is not quite true. Sorry for the miscommunication, and thanks again for the corrections.


Depends how you're filling it: if you're planning on writing to it before even potentially reading anything, then why bother? It also depends what you're going to use the buffer for: if it's going to be treated as a string, then you just need to set the first byte to \0:

char buffer[1024];
buffer[0] = '\0';

However, if you're using it as a byte stream, then the contents of the entire array are probably going to be relevant, so memseting the entire thing or setting it to { 0 } as in your example is a smart move.


I also use memset(buffer, 0, sizeof(buffer));

The risk of not using it is that there is no guarantee that the buffer you are using is completely empty, there might be garbage which may lead to unpredictable behavior.

Always memset-ing to 0 after malloc, is a very good practice.


yup, calloc() method defined in stdlib.h allocates memory initialized with zeros.


I'm not familiar with the:

char buffer[1024] = {0};

technique. But assuming it does what I think it does, there's a (potential) difference to the two techniques.

The first one is done at COMPILE time, and the buffer will be part of the static image of the executable, and thus be 0's when you load.

The latter will be done at RUN TIME.

The first may incur some load time behaviour. If you just have:

char buffer[1024];

the modern loaders may well "virtually" load that...that is, it won't take any real space in the file, it'll simply be an instruction to the loader to carve out a block when the program is loaded. I'm not comfortable enough with modern loaders say if that's true or not.

But if you pre-initialize it, then that will certainly need to be loaded from the executable.

Mind, neither of these have "real" performance impacts in the small. They may not have any in the "large". Just saying there's potential here, and the two techniques are in fact doing something quite different.

0

精彩评论

暂无评论...
验证码 换一张
取 消