Using a tree to traverse a separate set of elements with the same parent/child relationship_问答_开发者

Here's a restatement of the rather cryptic title question:

Suppose we have a Prototype tree that has been built, that contains all the info on the structure of the tree and the generic description of each node. Now we want to create instances of this tree with elements that contain extra unique data. Let's call these Concrete trees.

The only difference between Concrete and Prototype trees is the extra data in the nodes of the Concrete tree. Supposing each node of a Concrete tree开发者_如何学编程 has a pointer/link to the corresponding element in the Prototype tree for generic information about the node, but no parent/child information of its own:

Is it possible to traverse the Concrete tree?

In particular, given a starting node in the Concrete tree, and a path through the Prototype tree, is it possible to efficiently get the corresponding node in the Concrete tree? There can be many Concrete trees, so a link back from Prototype tree is not possible.

Even though I might not need to optimize things to such an extent in my code, this is still an interesting problem!

Thanks in advance!

NOTE: There are no restrictions on the branching factor of the tree- a node can have between one and hundreds of children.

Extra ramblings/ideas:

The reason I ask, is that it seems like it would be a waste to copy parent/child information each time a new instance of a Concrete tree is created, since this structure is identical to the Prototype tree. In my particular case, children are identified by string names, so I have to store a string-to-pointer hash at each node. There can be many instances of Concrete trees, and duplicating this hash seems like a huge waste of space.

As a first idea, perhaps the path could be somehow hashed into an int or something that compactly identifies an element (not a string, since that's too big), which is then used to look up concrete elements in hashes for each Concrete tree?

Once created, will the prototype tree ever change (i.e. will nodes ever be inserted or removed)?

If not, you could consider array-backed trees (i.e. child/parent links are represented by array indices, not raw pointers), and use consistent indexing for your concrete trees. That way, it's trivial to map from concrete to prototype, and vice versa.

You could have a concrete leaf for each prototype node, but you'd need to do some kind of hashing per tree (as you suggest) to keep different concrete trees separate. At this point you've incurred the same storage cost as a completely separate tree with redundant child/parent pointers. You definitely want a link from the prototype tree to the concrete trees.

I can see this approach being useful if you want to make structural changes to the prototype tree affect all linked concrete trees. Shuffling nodes would instantly affect all concrete trees. You may incur extra cost since it will be impossible to transmit a single concrete tree without either sending every concrete tree or doing some extract operation to rip one tree out.

In general you will not be able to encode a path uniquely in an int.

Just store the parent child relationship in the concrete tree and forget about it. At best it's a single pointer value, worst it's two pointer values. You would need at least that much to keep links between the prototype tree and the concrete tree anyway.

Its possible when there's a known dependency between addresses of nodes in both trees. Basically it means that nodes have to be fixed-size and allocated all at once. Sure, its also possible to use a hashtable for mapping of addresses of first tree nodes to second tree nodes, but such a hashtable has to have at least 10x more nodes than first tree, otherwise mapping would be too slow.

#include <stdio.h>

typedef unsigned char byte;

struct Node1 {
  Node1* child[2];
  Node1() { child[0]=child[1]=0; }
};

struct Node2 {
  int N;
  Node2() { N=0; }
};

int main( void ) {

  int i,j,k,N = 256;

  Node1* p = new Node1[2*N];
  Node2* q = new Node2[2*N];

  // insert
  for( i=0,k=1; i<N; i++ ) {
    Node1* root = &p[0];
    Node1** r = &root;
    for( j=7;; j-- ) {
      if( r[0]==0 ) r[0]=&p[k++];
      if( j<0 ) break;
      r = &r[0]->child[(i>>j)&1];
    }
    q[r[0]-p].N = byte(i+123);
    // ^^^^^ - mapping from p[] to q[]
  }

  // check
  for( i=N-1; i>=0; i-- ) {
    Node1* r = &p[0];
    for( j=7; j>=0; j-- ) r = r->child[(i>>j)&1];
    if( q[r-p].N != byte(i+123) ) printf( "error!\n" );
  }

}

I think you can do what you describe, but I don't believe it constitutes an optimisation (for the type of reasons referred to by @Dave). The key to doing so lies in tying the pointers back to the prototype in such a way that they also act as identifiers. In addition major traversals through the prototype tree would need to be pre-calculated - a breadth first and a depth first traversal.

The pre-calculated traversals are likely to use a stack or queue, depending on the particular traversal. In addition, as the traversals are done, an indexed linked list needs to be built in the traversal order (or as @Oli suggests an indexed array). The data in the linked list is the identifier (see following) of the node. Each prototype tree and each prototype node needs an identifier (could be an address, or an arbitary identifier). Each concrete tree has its own identifier. Each concrete node is given the SAME identifier as its corresponding node in the prototype tree. Then to follow a partial traversal you identify the node identifier in the linked list and use this as the identifier of the concrete node.

In essence you are creating a link between the prototype and the concrete nodes, by using the equivalence of the identifiers as the pointer (a sort of "ghost" pointer). It does require a number of supporting mechanisms, and these are likely to cause this route not to be an actual optimisation.