I'm having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations. I compressed a 开发者_StackOverflow社区file with the text "TestingTesting" and got the following result in hex: 0B 49 2D 2E C9 CC 4B 0F 81 50 00
I tried reading the data after 16 and 32-bit endian swaps but after reading the first 3 bits I can't get any further because the next 5 bits don't make sense. What am I doing wrong and how can this be parsed?
References I've used: RFC 1951 Javascript C
The output from the compressor is a stream of bytes. Why are you doing an endian swap?
Looking at the first few bytes, as binary:
0B = 00001011
49 = 01001001
2D = 00101101
2E = 00101110
...
From section 3.1.1 in the RFC:
bits are read from right to left, so the first bit of the header,
BFINAL
, is1
:00001011 ^
numbers are packed LSB first, and we read from right-to-left, so
BTYPE
is01
:00001011 ^^
That's the fixed Huffman code block type, so we expect a Huffman code next. Huffman codes are packed MSB first, so the first code is
10000100
(we continue on to the next byte here):00001011 ^^^^^ 01001001 ^^^
Looking at the table in section 3.2.6,
00110000
to10111111
represent literal bytes 0 - 143, so10000100
(=0x84
) is the literal value0x54
, which is the ASCII code for "T
".Continuing, the next code is
10010101
(=0x95
) which is literal value0x65
which is "e
".
...and so on.
精彩评论