开发者

Can __attribute__((packed)) affect the performance of a program?

开发者 https://www.devze.com 2023-01-11 14:34 出处:网络
I have a structure called log that has 13 chars in it.after doing a sizeof(log) I see that the size is not 13 but 16. I can use the __attribute__((packed)) to get it to the actual size of 13 but I won

I have a structure called log that has 13 chars in it. after doing a sizeof(log) I see that the size is not 13 but 16. I can use the __attribute__((packed)) to get it to the actual size of 13 but I wonder if this will affect the performance of the prog开发者_如何学Goram. It is a structure that is used quite frequently.

I would like to be able to read the size of the structure (13 not 16). I could use a macro, but if this structure is ever changed ie fields added or removed, I would like the new size to be updated without changing a macro because I think this is error prone. Have any suggestion?


Yes, it will affect the performance of the program. Adding the padding means the compiler can use integer load instructions to read things from memory. Without the padding, the compiler must load things separately and do bit shifting to get the entire value. (Even if it's x86 and this is done by the hardware, it still has to be done).

Consider this: Why would compilers insert random, unused space if it was not for performance reasons?


Don't use __attribute__((packed)). If your data structure is in-memory, allow it to occupy its natural size as determined by the compiler. If it's for reading/writing to/from disk, write serialization and deserialization functions; do not simply store cpu-native binary structures on disk. "Packed" structures really have no legitimate uses (or very few; see the comments on this answer for possible disagreeing viewpoints).


Yes, it can affect the performance. In this case, if you allocate an array of such structures with the ((packed)) attribute, most of them must end up unaligned (whereas if you use the default packing, they can all be aligned on 16 byte boundaries). Copying such structures around can be faster if they are aligned.


Yes, it can affect performance. How depends on what it is and how you use it.

An unaligned variable can possibly straddle two cache lines. For example, if you have 64-byte cache lines, and you read a 4-byte variable from an array of 13-byte structures, there is a 3 in 64 (4.6%) chance that it will be spread across two lines. The penalty of an extra cache access is pretty small. If everything your program did was pound on that one variable, 4.6% would be the upper bound of the performance hit. If logging represents 20% of the program's workload, and reading/writing to the that structure is 50% of logging, then you're already at a small fraction of a percent.

On the other hand, presuming that the log needs to be saved, shrinking each record by 3 bytes is saving you 19%, which translates to a lot of memory or disk space. Main memory and especially the disk are slow, so you will probably be better off packing the log to reduce its size.


As for reading the size of the structure without worrying about the structure changing, use sizeof. However you like to do numerical constants, be it const int, enum, or #define, just add sizeof.


As with all other performance optimizations, you'll need to profile your code to find the right answer. The right answer will vary by architecture --- and how your use your structure.

If you're creating gigantic arrays the space savings from packing might mean the difference between fitting and not fitting in cache. Or your data might already fit into your cache, in which case it will make no difference. If you're allocating large numbers of the structures in an STL associative container that allocates the storage for your struct with operator new it might not matter at all --- operator new might round your storage up to something that's aligned anyway.

If most of your structures live on the stack the extra storage might already be optimized away anyway.

For a change this simple to test, I suggest building a timing rig and then trying things both ways. For further optimizations I suggest using a profiler to identify your bottlenecks and go from there.

0

精彩评论

暂无评论...
验证码 换一张
取 消