If i have a tree structure whose nodes can have zero to many children, with each node holding some data value along with a boolean switch, how do i minimally represent the state of this tree for nodes with a particular switch value?
For example, say that my tree looks something like:
A[0] -> B[1] -> C[1]
|-开发者_开发问答----> D[1]
|-----> E[1]
Here we have a state where 4 nodes are checked, is there a way to represent this state in a concise manner? The naive approach would be to list the four nodes as being checked, but what if node B had 100 children instead of just four?
My current line of thinking is to store each node's ancestor in the data component and describe the checked state in terms of the set of ancestors that minimize the data required to represent a state. In the tree below, an ancestor of node N is represented as n'. So the above tree would now look something like:
A[0, {a}] -> B[1, {a', b}] -> C[1, {a' b' c}]
|--------------> D[1, {a' b' d}]
|--------------> E[1, {a' b' e}]
Now you can analyze the tree and see that all of node A's children are checked, and describe the state simply as the nodes with data element a' are set to 1, or just [a']. If node D's state switched to 0, the you could describe the tree state as [a' not d].
Are there data structures or algorithms that can be used to solve a problem of this type? Any thoughts on a better approach? Any thoughts on the analysis algorithm?
Thanks
Use a preorder tree traversal starting from the root. If a node is checked don't traverse its children. For each traversed node store it's checked state (boolean 0/1) in a boolean bitmap (8bits/byte). Finally compress the result with zip/bzip or any other compression technique.
When you reconstruct the state, first decompress, then use preorder tree traversal, set each node based on the state, if state is checked set all children to checked and skip them.
In general there is no technique that will always be able to store the checked elements in fewer than n bits of space, where n is the number of elements in the tree. The rationale behind this is that there are 2^n different possible check states, so you need at least 2^n different encodings, so there must be at least one coding of length 2^n since there are only 2^n - 1 encodings that are shorter than this.
Given this, if you really want to minimize space usage, I would suggest going with an encoding like the one @yi_H suggests. It uses precisely n bits for each encoding. You might be able to compress most of the encodings by applying a standard compression algorithm to the bits, which for practical sets of checked nodes might do quite well, but which degrades gracefully in the worst case.
Hope this helps!
精彩评论