I have three integers {a
, b
, c
} that range (say) between the following values:
a
- {1 to 120, in jumps开发者_运维技巧 of 1}
b
- {-100 to 100, in jumps of 5}
c
- {1 to 10, in jumps of 1}
Due to space considerations, I would like to represent these three values using 1-byte ONLY, meaning, a single integer (in the range of -127..128) will represent the results of {a
, b
, c
} and be stored in a binary format to disk.
Later, when I read the binary data, I will know how to 'parse' this 1-byte to get the values of {a
, b
, c
}.
Any idea how to achieve that? (note: if need be, in order to support this design, I can 'compromise' on the ranges; for example, say, a
can be in jumps of 5. b
can also be in jumps of 10 etc)
Just from a numbers point of view we have:
a = 120 values, b = 41 values, c = 10 values
That makes for a total of 49,200 unique values. A byte can only represent 256 values, so you'd need to use at least 16-bits (two bytes) to represent your range.
One way to do so would be through bit shifting.
As an example, you can store four 8-bit values in a 32-bit value, and extract them like so:
#include <iostream>
using namespace std;
int pack32(char *v)
{
return (v[0] << 24) + (v[1] << 16) + (v[2] << 8) + v[3];
}
void unpack32(int a, char *v)
{
v[0] = a >> 24;
v[1] = a >> 16;
v[2] = a >> 8;
v[3] = a;
}
int main()
{
char v[4] = {32, 64, 16, 8};
cout << "Original values: ";
for (int i = 0; i < 4 ; i++)
cout << (int)v[i] << " ";
cout << endl;
int q = pack32(v);
cout << "Packed: " << q << endl;
unpack32(q, v);
cout << "Unpacked: ";
for (int i = 0; i < 4; i++)
cout << (int)v[i] << " ";
return 0;
}
Code relevant to your needs:
unsigned short pack32(unsigned a, char b, unsigned c)
{
// Layout:
// Bits 0 - 5 are reserved for a
// Bits 6 - 12 are reserved for b
// Bits 13 - 15 are reserved for c
// Assumptions:
// a is [2, 120] in steps of 2
// b is [-100, 100] in steps of 5
// c is [1, 10] in steps of 1
// Shift a from [2, 120] to [0, 59]
unsigned a2 = (a - 2) >> 1;
// Shift b from [-100, 100] to [0, 40]
unsigned b2 = b / 5 + 20;
// Shift c from [1, 10] to [0, 9]
unsigned c2 = c - 1;
return a2 + (b2 << 5) + (c2 << 12);
}
a - {1 to 120, in jumps of 1} = 120 values = log2(120) = 6.9 bits
b - {-100 to 100, in jumps of 5} = 41 values = log2(41) = 5.4 bits
c - {1 to 10, in jumps of 1} = 10 values = log2(10) = 3.3 bits
Total = 15.6 bits, so you could pack all these into one 16 bit value, but not into an 8 bit byte.
You'll need to compromise quite a lot on your ranges to get everything into a single byte.
For simplicity, you probably want to store each value in a whole number of bits - so work out how many bits you want for each value. For example, you could use:
- a (3 bits)
- b (3 bits)
- c (2 bits)
That will give you 8 different values for a
, 8 different values for b
, and 4 different values for c
. That's much much less information than you originally had, of course. Once you've chosen a scheme like that, the rest is just a matter of:
- Converting each original value into its "compressed" pattern (e.g. for
a
you may represent 1 as 0, and 120 as 7) - Combining the three compressed values into a single byte (using bit-shifting and bitwise OR)
- Later splitting the single byte into the three compressed values (using bit-shifting and masking)
- Converting each compressed value into an "uncompressed" value which is reasonably close to the original value
Based on Mike's answer, but with the numbers correct:
a = 120 values, b = 41 values, c = 10 values
That makes for a total of 49,200 unique values. A byte can only represent 256 values, so you'd need to use at least 16-bits (two bytes) to represent your range.
Now let's suppose we want to use different bits to represent each of these numbers (i.e. no compression that mingles these somehow):
a
fits comfortably in 7 bits, b
fits comfortably in 6 bits, and c
fits comfortably in 4 bits. (By "fits comfortably", I mean that's the smallest integer number of bits this data can fit in.) That's 17 bits, so without applying some kind of compression you might as well use a separate byte for each value.
Now, let's discuss a way to fit this into one character by changing step sizes in these values.
You can divide these up into two 2-bit values (allowing 4 values each) and one 4-bit value. Or you can divide these up into two 3-bit values (allowing 8 values each) and one 2-bit value. You can decide how to assign these to your variables a
, b
, and c
.
The best way to store these in C is with a struct containing bit fields:
struct myvalues{
unsigned a:3;
signed b:3;
unsigned c:2;
};
//look at your compiler and platform documentation
//to make sure you can pack this properly
Then you can access the fields a
, b
, and c
, by name directly (though you'll have to do some math to convert the values.)
Other languages (Java, C#, etc.) aren't so flexible about how you define types, so you'll need to resort to bit shifts in those languages.
精彩评论