开发者

How to identify orphaned nodes

开发者 https://www.devze.com 2023-01-02 20:28 出处:网络
I have a hierarchy of nodes stored in DB. I select all, store them in an array, then iterate over them and create a nested array in memory.

I have a hierarchy of nodes stored in DB. I select all, store them in an array, then iterate over them and create a nested array in memory.

The input looks like this:

[{name: A}, {name: B}, {name: X, parent: A}, {name: Y, parent: A}, {name: C}]

The output looks like this:

[{name: A, children:[{name: X}, {name: Y}]}开发者_高级运维, {B}, {C}]

There is no limit on how deep the nesting can go.

The problem I have is that if one of the records has an invalid parent reference, it cannot be put in the hierarchy and the script ends in an infinite loop, trying to find the parent.

I bet there's a way to tell when I've fallen into the infinite loop. For the record, when in the loop I realize there's no parent to insert the item into, I push the item at the end of the array because the parent might exists down the line.

I suppose I should be able to realize that I'm cycling the same items over and over again?

Edit 1 - the code This is the important bit:

    $cnt = count($array);
    do {
        $item = array_shift($array);
        if ($this->push($item)) {
            $cnt--;
        } else {
            array_push($array, $item);
        }
    } while ($cnt > 0);

($this->push() is a method that tries to find a parent and, if it succeeds, inserts $item into its hierarchy)


Looks like you are using a queue (remove from the front, add to the back) kind of data structure to store unprocessed nodes. As nodes are inserted into your output data structure they are dropped from the queue. If a node cannot be added to the output (because its 'parent' has not been moved to the output data structure yet) it is re-queued. Eventually the queue should become empty unless there are nodes where the 'parent' does not exist (orphans).

I imagine your algorithim looks something like

 Do While not QueueEmpty()
    node = Dequeue() ' Remove from the front
    if not AddNodeToTree(node) then Queue(node) 'add to the back
 end

Where AddNodeToTree is a function that takes a node, successfully adds it to the output and returns True. Otherwise it returns False causing the node to recycle.

The only thing you should have to do is add a sentinal node to the back of the queue and a flag to indicate that at least one node was consumed from the queue during one complete cycle through it. The above algorithm becomes:

set NodeProcessed to False
Queue(SentinalNode) ' marker to identify cycle through queue
Do while not QueueEmpty()
  node = Dequeue()
  if isSentinalNode(node) then
     if NodeProcessed then 
        Queue(node)
        set NodeProcessed to False
     else
        ' Queue contains only orphans or is empty
     end
  end
  if AddNodeToTree(node) then
     set NodeProcessed to True
  else
     Queue(node)
  end
end

The SentinalNode is a marker that you use to detect looping through the queue.

Your output data structure looks like it contains a 'forest' of trees. That is, it contains several distinct trees. If there is any possibility that a given node may be shared among two or more trees, the above algorithm will not work properly. If a node may appear in at most one tree in the 'forest' then the above should be fine.

0

精彩评论

暂无评论...
验证码 换一张
取 消