How would I go about creating a file compressor similar to ZIP archives in a language like C/C++? [closed]_问答_开发者

How would I go about creating a file compressor similar to ZIP archives in a language like C/C++? [closed]

开发者 https://www.devze.com 2023-02-06 19:04 出处：网络

It's difficult to tell what is being asked here. This question is amb开发者_Go百科iguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current form. F

It's difficult to tell what is being asked here. This question is amb开发者_Go百科iguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 12 years ago.

So I was thinking of how a .zip archive is structured and then I thought, how could I create my own archive format.

You would want to know what you want to compress. E.G. zip works great for many things, but not so well for audio files. FLAC works well for audio, but poorly on text files ( provided you could find a way to apply it )

Once you had a compression scheme you would allocate the appropriate metadata so you could later decompress the information, followed by the compressed data.

Perhaps you would research A lossless compression method such as Entropy Encoding. You might decided that Arithmetic coding was more optimal than Huffman coding and decide to implement an Arithmetic codec. You might also look at Dictionary encoding if you are more interested in compressing text.

Edit in response to comment

One would have to include the entropy tables decided upon when encoding the data so it could be later decoded.

Take for example JPEG. JPEG uses a Colorspace transformation to YCrCb, Quantization, A Discrete Cosine Transformation, and then uses Huffman coding on the data. The color space transformation metadata is included in the headers. (how many bits per color and how many samples per channel, along with the size of the image. ) The quantization tables are included and an index of which table match which channel. And the used huffman tables to encode the DC and AC Coefficients. The Discrete Cosine Transformation and ZigZag Coefficient pattern is part of the standard. So after De-quantization you must IDCT the information and dezigzag the coefficients.

Basically for JPEG.
- Read the given tables in the header.
- Figure out the entropy encoded format with the header info about size and color.
- Use the Huffman table to expand the data segment
- Dequantize appropriately
- IDCT and de zigzag

You would have to make your own standard, figure out the minimum information needed to recover the information and store it in a way readable without knowing details of whats inside.

I don't know about .zip, but I would imagine it would have a couple dictionary tables and a couple entropy tables. You would de-entropy encode the datasegment (which must be somehow determined by standard or marker ), then use a reverse dictionary substitution.

Download the sources of bzip2 and compile them. And then go from there.