The downside is the output has to be either encoded as UTF8 or ISO-8859-1.
I've tried to use base64_encode(gzdeflate($string, 9))
, but the result ends up being more than the origi开发者_C百科nal string.
Can anyone think of a way to do this?
Thanks
Compressed data is basically binary - it has no character set, it's just a sequence of bytes. base64 basically increases the size of the input by a factor of about 1.33, so unless the string compressed to less than .66 or so of original size, you're going to lose out.
The bigger question is why would you need to re-encode the compressed data? Is it to display it as "plain text" instead of the random 'garbage' it would be if you output the raw bytes?
base64 encoding adds overhead because you're converting binary to plain text. If your string is short, this overhead will be greater than the gains of the compression. However, this method should work just fine on large strings.
Well base64 encoding will obviously destroy all your savings since it increases the size by at least 8/6 (a bit more actually in a correct implementation)
If by ISO-8859-1 you mean the charset defined by IANA you can encode binary data with it, since all 255 values are defined (though it would contain control chars). ISO 8859-1 (note the missing hyphen) on the other hand doesn't define all 255 values and UTF-8 is also out of the question.
If your space savings are more than ~14% you could use a 7bit/8bit encoding and just leave the MSB 0 (that'll work just fine for UTF8 but still control sequences)
精彩评论