I want to generate triangles from points and optional relations between them. Not all points form triangles, but many of them do.
In the initial structure, I've got a database with the following tables:
Nodes(id, value)
Relations(id, nodeA, nodeB, value) Triangles(id, relation1_id, relation2_id, relation3_id)In order to generate triangles from both nodes and relations table, I've used the following query:
INSERT INTO Triangles
SELECT t1.id, t2.id , t3.id,
FROM Relations t1, Relations t2, Relations t3
WHERE t1.id < t2.id AND t3.id > t1.id AND
(
t1.nodeA = t2.nodeA
AND (t3.nodeA = t1.nodeB AND t3.nodeB = t2.nodeB
OR t3.nodeA = t2.nodeB AND t3.nodeB = t1.nodeB)
OR
t1.nodeA = t2.nodeB
AND (t3.nodeA = t1.nodeB AND t3.nodeB = t2.nodeA
OR t3.nodeA = t2.nodeA AND t3.nodeB = t1.nodeB)
)
It's working perfectly on small sized data. (~< 50 points) In some cases however, I've got around 100 points all related to each other which leads to thousands of relations. So when the expected number of triangles is in the hundreds of thousands, or even in 开发者_StackOverflowthe millions, the query might take several hours.
My main problem is not in the select query, while I see it execute in Management Studio, the returned results slow. I received around 2000 rows per minute, which is not acceptable for my case.
As a matter of fact, the size of operations is being added up exponentionally and that is terribly affecting the performance.
I've tried doing it as a LINQ to object from my code, but the performance was even worse.
I've also tried using SqlBulkCopy on a reader from C# on the result, also with no luck.
So the question is... Any ideas or workarounds?
I think you just need this:
INSERT
INTO triangles (relation1_id, relation2_id, relation3_id)
SELECT r12.id, r23.id, r13.id
FROM relations r12
JOIN relations r23
ON r23.nodeA = r12.nodeB
JOIN relations r13
ON r13.nodeA = r12.nodeA
AND r13.nodeB = r23.nodeB
ON r12.nodeA = n1.id
WHERE NOT EXISTS
(
SELECT relation1_id, relation2_id, relation3_id
FROM triangles
INTERSECT
SELECT r12.id, r23.id, r13.id
)
Make the following check constraint:
ALTER TABLE relations ADD CONSTRAINT CHECK (nodeA < nodeB)
This point is that since the relations
are symmetric, you only need to store the relation once per pair and the triangle once per permutation.
The check constraint ensures that the relations are stored once per pair in the fixed order.
The query is designed so that the triangles are also stored in fixed order: 1 - 3 first, 2 - 3 second, 1 - 3 third, where 1, 2 and 3 are determined by the order of nodes the relations link.
Also note that the number of triangles grows as O(n^3)
, not exponentially (O(exp(N))
).
For 100
nodes, there can be at most 100 * 99 * 98 / 6 = 161700
triangles.
The query you are using has some problems:
- "t1.id < t2.id AND t3.id > t1.id" doesn't prevent t2 = t3. Change this to "WHERE t1.id < t2.id AND t3.id > t2.id"
- t1.nodeA might not exist in t2 at all.
Quassnoi's solution looks promising. It has problems as well. First of all, it is selecting node IDs, not relation IDs, into triangles. Second, it isn't checking that a relation between n1 and n3 exists. I'd rewrite the statement as:
INSERT
INTO triangles (relation1_id, relation2_id, relation3_id)
SELECT r12.id, r23.id, r13.id
FROM relations r12
JOIN relations r23
ON r12.nodeB = r23.nodeA
JOIN relations r13
ON r12.nodeA = r13.nodeA AND r23.nodeB = r13.nodeB
WHERE NOT EXISTS
(
SELECT relation1_id, relation2_id, relation3_id
FROM triangles
WHERE relation1_id = r12.id AND relation2_id = r23.id AND relation3_id = r13.id
)
This solution doesn't depend on any constraint on ordering of IDs in the relations table, and doesn't guarantee any order of IDs in the triangles table.
精彩评论