I've been loading a lot of binary files recently using C/C++, and I'm bothered by how inelegant it can be. Either I get a lot of code that looks like this (I've since moved on):
uint32_t type, k;
uint32_t *variable;
FILE *f;
if (!fread(&type, 4, 1, f))
goto boundsError;
if (!fread(&k, 4, 1, f))
goto boundsError;
variable = malloc(4 * k);
if (!fread(variable, 4 * k, 1, f))
goto boundsError;
Or, I define a local, packed struct so that I can read in constant-sized blocks easier. It seems to me, however, that for such a simple problem—that is, reading开发者_如何学运维 a specified file into memory—could be done more efficiently and in more of a readable manner. Does anyone have any tips/tricks etc? I'd like to clarify that I'm not looking for a library or something to handle this; I might be tempted if I were designing my own file and had to change the file spec a lot, but for now I'm just looking for stylistic answers.
Also, some of you might suggest mmap
—I love mmap! I use it a lot, but the problem with it is that it leads to nasty code for handling unaligned data types, which doesn't really exist when using stdio. In the end, I'd be writing stdio-like wrapper functions for reading from memory.
Thanks!
EDIT: I should also clarify that I can't change file formats—there's a binary file that I have to read; I can't request the data in another format.
The most elegant solution I've seen for this problem yet is Sean Barrett's writefv
, used in his tiny image-writing library stb_image_write
available here. He only implements a few primitives (and no error handling), but the same approach can be extended to what is basically a binary printf
(and for reading, you can do the same to get a binary scanf
). Very elegant and tidy! In fact, the whole thing is so simple, I might as well include it here:
static void writefv(FILE *f, const char *fmt, va_list v)
{
while (*fmt) {
switch (*fmt++) {
case ' ': break;
case '1': { unsigned char x = (unsigned char) va_arg(v, int); fputc(x,f); break; }
case '2': { int x = va_arg(v,int); unsigned char b[2];
b[0] = (unsigned char) x; b[1] = (unsigned char) (x>>8);
fwrite(b,2,1,f); break; }
case '4': { stbiw_uint32 x = va_arg(v,int); unsigned char b[4];
b[0]=(unsigned char)x; b[1]=(unsigned char)(x>>8);
b[2]=(unsigned char)(x>>16); b[3]=(unsigned char)(x>>24);
fwrite(b,4,1,f); break; }
default:
assert(0);
return;
}
}
}
and here is how he writes truecolor .BMP files using it:
static int outfile(char const *filename, int rgb_dir, int vdir, int x, int y, int comp, void *data, int alpha, int pad, const char *fmt, ...)
{
FILE *f;
if (y < 0 || x < 0) return 0;
f = fopen(filename, "wb");
if (f) {
va_list v;
va_start(v, fmt);
writefv(f, fmt, v);
va_end(v);
write_pixels(f,rgb_dir,vdir,x,y,comp,data,alpha,pad);
fclose(f);
}
return f != NULL;
}
int stbi_write_bmp(char const *filename, int x, int y, int comp, const void *data)
{
int pad = (-x*3) & 3;
return outfile(filename,-1,-1,x,y,comp,(void *) data,0,pad,
"11 4 22 4" "4 44 22 444444",
'B', 'M', 14+40+(x*3+pad)*y, 0,0, 14+40, // file header
40, x,y, 1,24, 0,0,0,0,0,0); // bitmap header
}
(definition of write_pixels
elided since it's pretty tangential here)
If you want to de-serialize binary data, one option is to define serialization macros for the structs that you want to use. This is a lot easier in C++ with template functions and streams. (boost::serialization is a non-intrusive serialization library, but if you want to go intrusive, you can make it more elegant)
Simple C macros:
#define INT(f,v) \
{ int _t; fread(&_t, sizeof(int), 1, f); v = ntohl(_t); }
#define FLOAT(f,v) \
{ int _t; fread(&_t, sizeof(int), 1, f); v = ntohl(_t); /* type punning */ memcpy(&v, &_t, sizeof(float)); }
...
Usage:
int a;
float b;
FILE *f = fopen("file", "rb");
INT(f, a);
FLOAT(f, b);
And, yes, serialization code is some of the most boring and brain-dead code to write. If you can, describe your data structures using metadata, and generate the code mechanically instead. There are tools and libs to help with this, or you can roll your own in Perl or Python or PowerShell or whatever.
I would make your code less inelegant looking by refactoring it out a bit, so your complex data structures are read with a series of calls of its underlying types.
I assume your code is pure C and not C++ because in the latter you would probably throw exceptions rather than using goto statements.
The array-reading part looks like it deserves its own reusable function. Beyond that, if you do actually have C++ available (it isn't completely clear from the question), then hard-coding the size of variables is unnecessary, as the size can be deduced from the pointer.
template<typename T>
bool read( FILE* const f, T* const p, size_t const n = 1 )
{
return n * sizeof(T) == fread(f, sizeof T, n, p);
}
template<typename T>
bool read( FILE* const f, T& result )
{
return read(f, &result);
}
template<typename Tcount, typename Telement>
bool read_counted_array( FILE* const f, Tcount& n, Telement*& p )
{
if (!read(f, n) || !(p = new Telement[n]))
return false;
if (read(f, p, n))
return true;
delete[] p;
p = 0;
return false;
}
and then
uint32_t type, k;
uint32_t *variable;
FILE *f;
if (read(f, type) &&
read_counted_array(f, k, variable) && ...
) {
//...
}
else
goto boundsError;
Of course, feel free to continue using malloc
and free
instead of new[]
and delete[]
if the data is being handed off to code that assume that malloc
was used.
Here's some C99 code I came up with:
- read_values.h, read_values.c
- read_array.h, read_array.c
Your example would read:
#include "read_values.h"
#include "read_array.h"
assert(sizeof (uint32_t) == 4);
uint32_t type, k;
uint32_t *variable;
FILE *f;
_Bool success =
read_values(f, "c4c4", &type, &k) &&
read_array(f, variable, k);
if(!success)
{
/* ... */
}
You might be interested in protocol buffers and other IDL schemes.
精彩评论