开发者

Is there a better compression library for strings than DotNetZip or LZMA?

开发者 https://www.devze.com 2023-03-29 16:48 出处:网络
I have a string of data a bit over 800 characters that I\'m trying to compress down to use on a QR code (I\'d like at least 50%, but would probably be happy if I got it to less than seven hundred). He

I have a string of data a bit over 800 characters that I'm trying to compress down to use on a QR code (I'd like at least 50%, but would probably be happy if I got it to less than seven hundred). Here's an example string I'm trying to compress, containing 841 characters:

+hgoSuJm2ecydQj9mXXzmG6b951L2KIl0k9VGzIEtLztuWO2On9rt7DUlH0lXzG4iJ1yK0fA
97mDyclKSttIZXOxSPBf85LEN4PUUqj65aio5qwZttZSZ64wpnMFg/7Alt1R39IJvTmeYfBm
Tuc1noMMcknlydFocwI8/sk2Sje5MR/nYNX0LPkQhzyi5vFJdrndqAgXYULsYrB3TJDAwvgs
Kw9C5EJnrlqcb21zg17O2gU/C8KY0pz9RPzUl1Sb0rCP8iZCeis4YbQ5tuUppOfnO/X0Mosv
SOQJ/bF9juKW8ocnQvNjsNxGV1gPkWWtiU2Old7Qm7FLDqL6kQKrq356yifs0NiMVGdvAg32
eugewuttCugoZASYOpQdwPu1jMxVO1fzF3zEy5w6tDlcfA2DZwa+un9/k8XZWAO/KVExy68q
UtVRQxsIOKgpl/2tNw5DBAKbykKIkmizbsA2xtzqnYqld4kOdNMJh3YjlqWF9Bt8MZo7a+Q6
jgayr2rjpyIptc599DGtvp68ZNQ64TKNmiMnnyGMo3E+xW34G3RrsYnHGm+xJoLKoOJhacDu
oZke1ycJgQv+Y61WPrvtFOVBxV5rvSzO0+8px5AWN3uCrrw1RmT5N14IVhh6BOtRjsifqIB2
dAKxzBNsvbXm1SzkuyqYiMnp5ivy3m2mPwc9GLsykx0FRIkhCYO8ins9E5ot9QvVnE155MFA
8FVwsP5uNdOF4EzQS2/h2QK3zb5Yq4Nftlo605Dd5vuVN/A7CUN38DaAKBxDKgqDzydfQnZw
R0hTfMHNLgBJKNDSpz2P6almGlUJtXT6IYmzuU2Iaion8eP开发者_运维技巧G

I've already tried the following three libraries:

  1. The built-in .NET GzipStream
  2. DotNetZip, including,
    • GzipStream
    • DeflateStream
  3. The LZMA SDK from 7-zip

I'm running into an issue where the compression is actually making the string longer. My understanding was that DeflateStream had the least overhead, yet it's still adding characters on. Using DotNetZip, I told it to use maximum compression:

Imports Ionic.Zlib

Shared Function CompressData(data As Byte()) As Array

    Dim msCompressed As MemoryStream = New MemoryStream

    ' I'm not sure if the last parameter on this next function should be
    ' true (for LeaveOpen), but it doesn't seem to affect it either way.
    Dim deflated As DeflateStream = New DeflateStream(msCompressed, _
        CompressionMode.Compress, CompressionLevel.BestCompression, True)

    ' Write data to compression stream (which is linked to the memorystream)
    deflated.Write(data, 0, data.Length)
    deflated.Flush()
    deflated.Close()

    Return msCompressed.ToArray
End Function

I'm only thinking this is going to get worse as I'm going to have even more data. Is there some better compression algorithm for strings of this length? Does compression normally only work on longer strings? Unfortunately, the data is such that I can't use stand-in characters for pieces of data.

Also, am I able to use alphanumeric encoding for the QR code, or do I have to use binary? I don't think I can, per http://www.qrme.co.uk/qr-code-forum.html?func=view&catid=3&id=324, but I'd like to make sure.

Thanks for your help!


At first glance, it appears that you are trying to take some data and convert it into a QR code with this process:

--> encrypt --> base64 encode --> compress --> make QR code.

I suggest using this process instead:

--> compress --> encrypt --> make QR code.

When you want to both encrypt and compress, pretty much everyone recommends compress-then-encrypt. (Because encryption works just as well with compressed data as with uncompressed data. But compression usually makes plaintext shorter and encrypted files longer. For more details, see: "Can I compress an encrypted file?" "Compress and then encrypt, or vice-versa?" "Composing Compression and Encryption" "Compress, then encrypt tapes" "Is it better to encrypt a message and then compress it or the other way around? Which provides more security?" "Compressing and Encrypting files on Windows" "Encryption and Compression" "Do encrypted compression containers like zip and 7z compress or encrypt first?" "When compressing and encrypting, should I compress first, or encrypt first?", etc.)

"am I able to use alphanumeric encoding for the QR code, or do I have to use binary?"

Most encryption algorithms produce binary output, so it will be simplest to directly convert that to binary-encoded QR code. I suppose you could somehow convert the encrypted data to something that QR alphanumeric coding could handle, but why?

"Is there some better compression algorithm"

For encrypted data, No. It is (almost certainly) impossible to compress well-encrypted data, no matter what algorithm you use.

If you compress-then-encrypt, as recommended, then the effectiveness of various compression algorithms depends on the particular kinds of input data, not on what you do with it after compression.

What kind of data is your input data?

If, hypothetically, your input data is some short of ASCII text, perhaps you could use one of the compression algorithms mentioned at "Really simple short string compression" "Best compression algorithm for short text strings" "Compression of ASCII strings in C" "Twitter text compression challenge".

If, on the other hand, your input data is some sort of photograph, perhaps you could use one of the many compression algorithms mentioned at "Twitter image encoding challenge".


This answer is related to Guffa's answer. He said that QR code can accept binary data and it must be a limitation of the library you are using.

I looked at the source code of the library. You call the Encode function right? This the contents of the encode function

public virtual Bitmap Encode(String content, Encoding encoding)
{
    bool[][] matrix = calQrcode(encoding.GetBytes(content));
    SolidBrush brush = new SolidBrush(qrCodeBackgroundColor);
    Bitmap image = new Bitmap( (matrix.Length * qrCodeScale) + 1, (matrix.Length * qrCodeScale) + 1);
    Graphics g = Graphics.FromImage(image);
    g.FillRectangle(brush, new Rectangle(0, 0, image.Width, image.Height));
    brush.Color = qrCodeForegroundColor ;
    for (int i = 0; i < matrix.Length; i++)
    {
        for (int j = 0; j < matrix.Length; j++)
        {
            if (matrix[j][i])
            {
                g.FillRectangle(brush, j * qrCodeScale, i * qrCodeScale, qrCodeScale, qrCodeScale);
            }
        }
    }
    return image;
}

The first line (encoding.GetBytes(content)) converts the string to bytes.

Get the source code then modify it to have this function: "public virtual Bitmap Encode(bytes[] content)"


The compression works by removing redundancy in the data, but the string seems to contain random/encrypted data, so there is no redundancy to remove.

However, it's data encoded using base-64, so each character only carries six bits of information. If you keep the binary data instead of base-64 encoding it, it's only 631 bytes.


You are comparing different compressors. The Zip-family usually use a statistical compression and the LZ-family an acronym for Lempel-Ziv is a dictionary compression to remove the redundancy in the input text. So, compression works by removing superflous informations. It works good on text files and images, not so good on audio, video and program files. For the latter there is lossy compression but not for program files. Given your example string it contains too much entropy to be compressed well. You can calculate the information entropy with -log(p)+log(2) where p is the probability of the character that occurs in your text. See also information theory and shannon-theorem.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号