Struggling with inflate algorithm_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-04 13:36 出处：网络

I\'m having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations.I compressed a 开发者_StackOverflow社区file with th

I'm having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations. I compressed a 开发者_StackOverflow社区file with the text "TestingTesting" and got the following result in hex: 0B 49 2D 2E C9 CC 4B 0F 81 50 00

I tried reading the data after 16 and 32-bit endian swaps but after reading the first 3 bits I can't get any further because the next 5 bits don't make sense. What am I doing wrong and how can this be parsed?

References I've used: RFC 1951 Javascript C

The output from the compressor is a stream of bytes. Why are you doing an endian swap?

Looking at the first few bytes, as binary:

0B  = 00001011
49  = 01001001
2D  = 00101101  
2E  = 00101110
...

From section 3.1.1 in the RFC:

bits are read from right to left, so the first bit of the header, BFINAL, is 1:
```
 00001011
        ^
```
numbers are packed LSB first, and we read from right-to-left, so BTYPE is 01:
```
 00001011
      ^^
```
That's the fixed Huffman code block type, so we expect a Huffman code next. Huffman codes are packed MSB first, so the first code is 10000100 (we continue on to the next byte here):
```
 00001011
 ^^^^^
 01001001
      ^^^
```
Looking at the table in section 3.2.6, 00110000 to 10111111 represent literal bytes 0 - 143, so 10000100 (= 0x84) is the literal value 0x54, which is the ASCII code for "T".
Continuing, the next code is 10010101 (= 0x95) which is literal value 0x65 which is "e".