开发者

Struggling with inflate algorithm

开发者 https://www.devze.com 2023-04-04 13:36 出处:网络
I\'m having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations.I compressed a 开发者_StackOverflow社区file with th

I'm having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations. I compressed a 开发者_StackOverflow社区file with the text "TestingTesting" and got the following result in hex: 0B 49 2D 2E C9 CC 4B 0F 81 50 00

I tried reading the data after 16 and 32-bit endian swaps but after reading the first 3 bits I can't get any further because the next 5 bits don't make sense. What am I doing wrong and how can this be parsed?

References I've used: RFC 1951 Javascript C


The output from the compressor is a stream of bytes. Why are you doing an endian swap?

Looking at the first few bytes, as binary:

0B  = 00001011
49  = 01001001
2D  = 00101101  
2E  = 00101110
...

From section 3.1.1 in the RFC:

  • bits are read from right to left, so the first bit of the header, BFINAL, is 1:

     00001011
            ^
    
  • numbers are packed LSB first, and we read from right-to-left, so BTYPE is 01:

     00001011
          ^^
    
  • That's the fixed Huffman code block type, so we expect a Huffman code next. Huffman codes are packed MSB first, so the first code is 10000100 (we continue on to the next byte here):

     00001011
     ^^^^^
     01001001
          ^^^
    
  • Looking at the table in section 3.2.6, 00110000 to 10111111 represent literal bytes 0 - 143, so 10000100 (= 0x84) is the literal value 0x54, which is the ASCII code for "T".

  • Continuing, the next code is 10010101 (= 0x95) which is literal value 0x65 which is "e".

...and so on.

0

精彩评论

暂无评论...
验证码 换一张
取 消