I wrote a source-removal algorithm to sort some dependencies between tables in our database, and it turns out 开发者_运维知识库we have a cycle. For simplicity, let's say we have tables A, B, C, and D. The edges are like this:
(A, B)
(B, A)
(B, C)
(C, D)
(D, A)
As you can see, there are two cycles here. One is between A and B and another is between all four of them. Will this type of sort always choke on the largest cycle? Or is that not necessarily the case?
By source-removal I presume you mean at each step removing a node with no incoming edges.
What I think you are asking for is finding the maximal Euler tour of your graph (i.e. a cycle with unique edges, while nodes can be repeated).
Obviously, no vertex in a cycle can be removed (no vertex in the cycle would have zero incoming edges), so this algorithm certainly preserves all cycles (and the biggest), but still, it doesn't help you find it, the remaining edges are not guaranteed to be part of any cycle (I can easily construct an example where the algorithm you describe retains all edges, while the largest cycle is merely of size two, thus not too helpful in finding the latter).
Here is how you can do it instead:
- Perform a Depth-First Search on your graph.
You are interested in recognizing back edges, i.e., in the traversal, an edge which points back to an ancestor (in the DFS tree, which is induced by edges of visiting nodes for the first time) of the visited node. For example, if the DFS stack has nodes [A->B->C->D] and while you explore D you find an edge D->B, that's a back edge. Each back edge defines a cycle.
More importantly, the cycles induced by back-edges are a basic set of cycles of the graph. "A basic set of cycles": you can construct all cycles of the graph just by UNIONing and XORing cycles of the basic set. For example, consider the cycles [A1->A2->A3->A1] and [A2->B1->B2->B3->A2]. You can union them to the cycle: [A1->A2->B1->B2->B3->A2->A3->A1]. Since you want the maximal cycle, you don't need to consider XORs.
- Construct the maximal cycle by UNIONing all basic cycles that intersect at a node. (If you do it carefully this should also have a linear time complexity).
On the other hand, if you required a maximal cycle with no repeating vertex, that's going to be much harder than linear :)
Your source removal algorithm (which I will assume means removing nodes with no dependencies one at a time, like Dimitris) will choke on any cycle. In fact, algorithm will remove all nodes that don't depend on the cycles, and the nodes you have left over will either be part of a cycle or depend on a node that is part of a cycle.
Those cycles are also called strongly connected components, and if you replaced each cycle with a single node you would have a DAG.
精彩评论