开发者

Dumping struct in C

开发者 https://www.devze.com 2023-03-05 19:20 出处:网络
Is it a good idea to simply dump a struct to a binary file using fwrite? e.g struct Foo { char name[100];

Is it a good idea to simply dump a struct to a binary file using fwrite? e.g

  struct Foo {
     char name[100]; 
     double f;
     int bar; 
  } data;

   fwrite(&data,sizeof(data),1,fout);

How portable is it? I think it's really a bad idea to just throw whatever the compiler gives(padding,integer size,etc...). even if platform portability is not important.

开发者_运维问答I've a friend arguing that doing so is very common.... in practice. Is it true???

Edit: What're the recommended way to write portable binary file? Using some sort of library? I'm interested how this is achieved too.(By specifying byte order,sizes,..?)


That's certainly a very bad idea, for two reasons:

  • the same struct may have different sizes on different platforms due to alignment issues and compiler mood
  • the struct's elements may have different representations on different machines (think big-endian/little-endian, IEE754 vs. some other stuff, sizeof(int) on different platforms)


It rather critically matters whether you want the file to be portable, or just the code.

If you're only ever going to read the data back on the same C implementation (and that means with the same values for any compiler options that affect struct layout in any way), using the same definition of the struct, then the code is portable. It might be a bad idea for other reasons: difficulty of changing the struct, and in theory there could be security risks around dumping padding bytes to disk, or bytes after any NUL terminator in that char array. They could contain information that you never intended to persist. That said, the OS does it all the time in the swap file, so whatEVER, but try using that excuse when users notice that your document format doesn't always delete data they think they've deleted, and they just emailed it to a reporter.

If the file needs to be passed between different platforms then it's a pretty bad idea, because you end up accidentally defining your file format to be something like, "whatever MSVC on Win32 ends up writing". This could end up being pretty inconvenient to read and write on some other platform, and certainly the code you wrote in the first place won't do it when running on another platform with an incompatible storage representation of the struct.

The recommended way to write portable binary files, in order of preference, is probably:

  1. Don't. Use a text format. Be prepared to lose some precision in floating-point values.
  2. Use a library, although there's a bit of a curse of choice here. You might think ASN.1 looks all right, and it is as long as you never have to manipulate the stuff yourself. I would guess that Google Protocol Buffers is fairly good, but I've never used it myself.
  3. Define some fairly simple binary format in terms of what each unsigned char in turn means. This is fine for characters[*] and other integers, but gets a bit tricky for floating-point types. "This is a little-endian representation of an IEEE-754 float" will do you OK provided that all your target platforms use IEEE floats. Which I expect they do, but you have to bet on that. Then, assemble that sequence of characters to write and interpret it to read: if you're "lucky" then on a given platform you can write a struct definition that matches it exactly, and use this trick. Otherwise do whatever byte manipulation you need to. If you want to be really portable, be careful not to use an int throughout your code to represent the value taken from bar, because if you do then on some platform where int is 16 bits, it won't fit. Instead use long or int_least32_t or something, and bounds-check the value on writing. Or use uint32_t and let it wrap.

[*] Until you hit an EBCDIC machine, that is. Not that anybody will seriously expect your files to be portable to a machine that plain text files aren't portable to either.


How fond are you of getting a call in the middle of the night? Either use a #pragma to pack them or write them variable by variable.


Yes, this sort of foolishness is very common but that doesn't make it a good idea. You should write each field individually in a specified byte order, that will avoid alignment and byte order problems at the cost of a little tiny bit of extra effort. Reading and writing field by field will also make your life easier when you upgrade your software and have to read your old data format or if the underlying hardware architecture changes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号