For a genetic algorithm application, I'm using a whole load of binary strings. Most of the time they literally take the form of 01001010110
, so that they can be mated, mutated and "crossed-over".
For transport and storage however, this se开发者_开发知识库ems wasteful. What's the simplest way to encode this as a shorter string?
I'm guessing this is pretty trivial, but I'm not sure where to start looking.
Update: I actually need to end up with another string: one of the transport requests will be GET requests.
The simplest would be to take each digit and treat it as a bit. Each group of 8 bits can be stored in a byte. Then you can send it as a stream of bytes. You will also need to store the length of the original string so that you can distinguish between "0" and "00".
Here is one way you could write the conversion from string to a byte array:
byte[] convertToBytes(string s)
{
byte[] result = new byte[(s.Length + 7) / 8];
int i = 0;
int j = 0;
foreach (char c in s)
{
result[i] <<= 1;
if (c == '1')
result[i] |= 1;
j++;
if (j == 8)
{
i++;
j = 0;
}
}
return result;
}
Reversing the operation is very similar.
If you need to transmit the data as a string you can base 64 encode the resulting byte array.
You may also want to consider keeping it in this form in memory too. This will be much more efficient than storing it as a string where each digit is stored as a 2 byte character. You are using roughly 16 times more memory than you need to for storing your data. The disadvtange is that it is slightly more difficult to use in this form, so if you have enough memory then what you are currently doing might be just fine.
What about converting it to it's base 10 integer equivalent?
int myBin = Convert.ToInt32("01001010110", 2);
Convert.ToInt32() documentation
I would just store them as an array of bytes and use a helper function to translate between the byte array version and the string version.
Or implement Run length encoding or Huffman coding. Both are fairly easy to implement. RLE is by far the easiest, but will in most cases have worse compression ratio. If your data typically has many consecutive characters of the same value, it could still provide a substantial improvement.
Abe Miessler's answer is a good one, but with caveat mentioned in comments.
If 64 bits is not enough to represent your string, then consider using a BigInt
class
http://www.codeproject.com/KB/cs/BigInt.aspx (you would probably want to add to/fromBinary()
extension methods to it. Alternatively represent this as a ... linked list of bytes.
Either approach has the problem of discarding any leading zeroes, so you would want to store the original length as well.
精彩评论