开发者

Representing 3 Integers Using One Byte?

开发者 https://www.devze.com 2023-02-16 14:53 出处:网络
I have three integers {a, b, c} that range (say) between the following values: a - {1 to 120, in jumps开发者_运维技巧 of 1}

I have three integers {a, b, c} that range (say) between the following values:

a - {1 to 120, in jumps开发者_运维技巧 of 1}

b - {-100 to 100, in jumps of 5}

c - {1 to 10, in jumps of 1}

Due to space considerations, I would like to represent these three values using 1-byte ONLY, meaning, a single integer (in the range of -127..128) will represent the results of {a, b, c} and be stored in a binary format to disk.

Later, when I read the binary data, I will know how to 'parse' this 1-byte to get the values of {a, b, c}.

Any idea how to achieve that? (note: if need be, in order to support this design, I can 'compromise' on the ranges; for example, say, a can be in jumps of 5. b can also be in jumps of 10 etc)


Just from a numbers point of view we have:

a = 120 values, b = 41 values, c = 10 values

That makes for a total of 49,200 unique values. A byte can only represent 256 values, so you'd need to use at least 16-bits (two bytes) to represent your range.

One way to do so would be through bit shifting.

As an example, you can store four 8-bit values in a 32-bit value, and extract them like so:

#include <iostream>
using namespace std;


int pack32(char *v)
{
    return (v[0] << 24) + (v[1] << 16) + (v[2] << 8) + v[3];
}

void unpack32(int a, char *v)
{
    v[0] = a >> 24;
    v[1] = a >> 16;
    v[2] = a >> 8;
    v[3] = a;
}

int main()
{
    char v[4] = {32, 64, 16, 8};

    cout << "Original values: ";
    for (int i = 0; i < 4 ; i++)
        cout << (int)v[i] << " ";
    cout << endl;

    int q = pack32(v);
    cout << "Packed: " << q << endl;

    unpack32(q, v);
    cout << "Unpacked: ";
    for (int i = 0; i < 4; i++)
        cout << (int)v[i] << " ";

    return 0;
}

Code relevant to your needs:

unsigned short pack32(unsigned a, char b, unsigned c)
{
    // Layout:
    // Bits 0 - 5 are reserved for a
    // Bits 6 - 12 are reserved for b
    // Bits 13 - 15 are reserved for c

    // Assumptions:
    // a is [2, 120] in steps of 2
    // b is [-100, 100] in steps of 5
    // c is [1, 10] in steps of 1

    // Shift a from [2, 120] to [0, 59]
    unsigned a2 = (a - 2) >> 1;
    // Shift b from [-100, 100] to [0, 40]
    unsigned b2 = b / 5 + 20;
    // Shift c from [1, 10] to [0, 9]
    unsigned c2 = c - 1;

    return a2 + (b2 << 5) + (c2 << 12);
}


a - {1 to 120, in jumps of 1} = 120 values = log2(120) = 6.9 bits

b - {-100 to 100, in jumps of 5} = 41 values = log2(41) = 5.4 bits

c - {1 to 10, in jumps of 1} = 10 values = log2(10) = 3.3 bits

Total = 15.6 bits, so you could pack all these into one 16 bit value, but not into an 8 bit byte.


You'll need to compromise quite a lot on your ranges to get everything into a single byte.

For simplicity, you probably want to store each value in a whole number of bits - so work out how many bits you want for each value. For example, you could use:

  • a (3 bits)
  • b (3 bits)
  • c (2 bits)

That will give you 8 different values for a, 8 different values for b, and 4 different values for c. That's much much less information than you originally had, of course. Once you've chosen a scheme like that, the rest is just a matter of:

  • Converting each original value into its "compressed" pattern (e.g. for a you may represent 1 as 0, and 120 as 7)
  • Combining the three compressed values into a single byte (using bit-shifting and bitwise OR)
  • Later splitting the single byte into the three compressed values (using bit-shifting and masking)
  • Converting each compressed value into an "uncompressed" value which is reasonably close to the original value


Based on Mike's answer, but with the numbers correct:

a = 120 values, b = 41 values, c = 10 values

That makes for a total of 49,200 unique values. A byte can only represent 256 values, so you'd need to use at least 16-bits (two bytes) to represent your range.

Now let's suppose we want to use different bits to represent each of these numbers (i.e. no compression that mingles these somehow):

a fits comfortably in 7 bits, b fits comfortably in 6 bits, and c fits comfortably in 4 bits. (By "fits comfortably", I mean that's the smallest integer number of bits this data can fit in.) That's 17 bits, so without applying some kind of compression you might as well use a separate byte for each value.

Now, let's discuss a way to fit this into one character by changing step sizes in these values.

You can divide these up into two 2-bit values (allowing 4 values each) and one 4-bit value. Or you can divide these up into two 3-bit values (allowing 8 values each) and one 2-bit value. You can decide how to assign these to your variables a, b, and c.

The best way to store these in C is with a struct containing bit fields:

struct myvalues{
  unsigned a:3;
  signed b:3;
  unsigned c:2;
};
//look at your compiler and platform documentation 
//to make sure you can pack this properly

Then you can access the fields a, b, and c, by name directly (though you'll have to do some math to convert the values.)

Other languages (Java, C#, etc.) aren't so flexible about how you define types, so you'll need to resort to bit shifts in those languages.

0

精彩评论

暂无评论...
验证码 换一张
取 消