I know there are many questions involving Huffman Code, including another one from 开发者_StackOverflowmyself, but I am wondering what would be the best way to actually encode a text file. Decompression seems trivial; traversing the tree, going left at 0 and right on 1, printing the character.
Though, how does one go about compression? Somehow store the bit representation of the character in it's node the tree? Search the tree for the character each time it is encountered and trace the steps? Does it matter which way this is coded?
Thus far, I have a huffman tree where the leaf nodes do not have a binary value associated with them. My trouble is assigning the binary values to each character in the tree.
Thanks
Well, if you are going to encode a file on a character basis, i can't see the problem, just keep the hash table of symbols, then construct a tree & write it in the beginning of a file using whatever convention you want, hten apply new alphabet to the text. Take a look at the approach taken in DEFLATE, which is used to compress PNG images.
EDIT
It is not really clear what the problem is.
Search the tree for the character each time it is encountered and trace the steps?
Each node in the tree represents an unique symbol. You don't have to search for anything, you can construct the Huffman tree only when you have already calculated each symbol's occurrence.
So you have already developed an algorithm to construct a tree and the problem is about how to assign the binary values to the nodes? Or where to store these values? The tree itself represents binary values naturally, you can actually track them during the tree construction, just keep the track of an items 'path' in the insert operation and store that value inside a node, or create a hash table if you don't want to modify the node entity.
精彩评论