So I was thinking of how a .zip archive is structured and then I thought, how could I create my own archive format.
You would want to know what you want to compress. E.G. zip works great for many things, but not so well for audio files. FLAC works well for audio, but poorly on text files ( provided you could find a way to apply it )
Once you had a compression scheme you would allocate the appropriate metadata so you could later decompress the information, followed by the compressed data.
Perhaps you would research A lossless compression method such as Entropy Encoding. You might decided that Arithmetic coding was more optimal than Huffman coding and decide to implement an Arithmetic codec. You might also look at Dictionary encoding if you are more interested in compressing text.
Edit in response to comment
One would have to include the entropy tables decided upon when encoding the data so it could be later decoded.
Take for example JPEG. JPEG uses a Colorspace transformation to YCrCb, Quantization, A Discrete Cosine Transformation, and then uses Huffman coding on the data. The color space transformation metadata is included in the headers. (how many bits per color and how many samples per channel, along with the size of the image. ) The quantization tables are included and an index of which table match which channel. And the used huffman tables to encode the DC and AC Coefficients. The Discrete Cosine Transformation and ZigZag Coefficient pattern is part of the standard. So after De-quantization you must IDCT the information and dezigzag the coefficients.
- Basically for JPEG.
- Read the given tables in the header.
- Figure out the entropy encoded format with the header info about size and color.
- Use the Huffman table to expand the data segment
- Dequantize appropriately
- IDCT and de zigzag
You would have to make your own standard, figure out the minimum information needed to recover the information and store it in a way readable without knowing details of whats inside.
I don't know about .zip, but I would imagine it would have a couple dictionary tables and a couple entropy tables. You would de-entropy encode the datasegment (which must be somehow determined by standard or marker ), then use a reverse dictionary substitution.
Download the sources of bzip2 and compile them. And then go from there.
精彩评论