short short int in c?_问答_开发者_运维开发者技术经验分享

I'm trying to squeeze as much out of my memory as possible. I have a matrix of 4.9999995e13 ints 开发者_开发问答but they only need to be true or false - basically I only need one bit of storage for each of these ints.

I understand that there are no single bit types in C (maybe someone can explain why, to me), and I also know that if a short short int existed it would be 1 byte, same as char. However all of the logical operations in C return ints (as well as a few other functions).

So my questions are:

Is there some way of making a short short int exist?
If I was to use char instead, would I have performance decrease because of all the casting to int that would have to be done?
Is there another way that I'm missing?

Just in-case it's relevant, I am compiling with GCC for C99.

EDIT I've just seen on this wikipedia page that there is a _Bool type, is this actually standard?

The _Bool type is standard in the most recent version of C, but that's still not what you want, because a _Bool still takes up at least one byte (as does a char, by definition).

No, if you want that many boolean bits you need to pack them into a bitfield or bit array. There is no standard datatype for bitfields in C, so you're also going to have to write your own macros or functions for getting the bit at a particular offset. I also hope that you're going to run this on a 64-bit machine with plenty of RAM, otherwise you're going to run out of memory and fast.

You have about 50 terabits of data. Do you want to fit them all in RAM at once? It woulld be totally insane to use more than one bit of RAM in orrder to keep one bit of information, and even then your computer would have to be about the size of the largest supercomputer on this planet. Forget performance of bit-packing. You will have to worry about totally different things.

What you want is a bitmap (or bit array as Wikipedia calls it).

And there is no such thing as a short short int, that's just a char which is the smallest integer storage class in C.

There might be some performance overhead when using this approach, but not because of implicit casts to ints, but rather because manipulating a bitmap is more tricky than directly manipulating array members.

A small example might help to illustrate:

Using a normal integer matrix:

int mat[8*8]; // assuming row major order
int is_element_set(int x, int y) { 
  return mat[y*8 + x];
}

With a bitmap:

unsigned char mat[8]; // assuming CHAR_BIT == 8
int is_element_set(int x, int y) { 
  return mat[y] & (1 << x);
}

5e13 that's about 5.6 terabytes of storage you would need only to represent your bitfield. There's probably a better way to handle your problem.

Maybe you could use some wise implementation of the bit field structs available in ANSI C.

Something like this:

typedef struct node_t_
{
    char bit0 : 1;
    char bit1 : 1;
    char bit2 : 1;
    char bit3 : 1;
    char bit4 : 1;
    char bit5 : 1;
    char bit6 : 1;
    char bit7 : 1;
} node_t;

Then, you could make some fast functions (maybe macros) to get and set elements in this matrix. I haven't ever implemented something like this, though.

C99 stdbool.h allows the use of bool. However here your problem is that 4.9999995e13/8 would give more or less 6.2500e+12 ($10^9$ are Gbyte, $10^12$ are Tbyte), so you need more than 6 Tbytes of real + virtual memory (to be lucky). This suggests you are doing something else wrong. You need to "scale" your problem in subproblems you can handle using less memory.

As other people have suggested, you should probably use a bitfield.

In addition though, if you're just using true/false values, and one of the values is much less common than the other, consider using an implicit coding. You can accomplish this easily with a map data structure. As you're doing work with graphs, this will save you an enormous amount of memory if your graph is at all sparse. If you combine this with the bit packing techniques above, you might even fit it all in RAM. Have to be pretty clever about the indexing though.

The other thing you could do, if you don't care about taking a performance hit during processing (i.e. if you're more worried about storing it than processing it), is run the structure through a compression algorithm in blocks. There's a C library for bzip2 which might save you 90% or more on something like that. Drawbacks are that this would take a (very!) long time. You might get comparable performance out of a bitwise compressor like Dynamic Markov Compression (DMC) on this, and those are much faster.

I'm trying to squeeze as much out of my memory as possible.

If this were true, then you would not waste 8 bits to store 1 bit worth of data. You'd use a bitfield.

If you know anything about the sort of contents the matrix has, then you can use other optimizations. For example, if you know that the vast majority of the matrix is usually set to zero, then you can store only the x,y pairs of the elements set to one.

If not, then 4.9999995e13 will take about 6 TB of RAM!