I'm working on a problem in Oracle that I'm struggling to solve 'elegantly'.
I have a data extract with three different identifiers: A, B, C
Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).
I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.
Extract table showing what the eventual groups should be:
Rownum | A | B | C | End group
1 p NULL NULL 1
2 p r NULL 1
3 q NULL NULL 2
4 NULL r NULL 1
5 NULL NULL s 2
6 q NULL s 2
My original approach was to assign a guid to each row in the extract and create a lookup table for the three identifiers:
GUID | IDENTIFIER | IDENTIFIER TYPE | GROUP | END GROUP
1 p A 1 1
2 p A 1 1
2 r B 2 1
3 q A 3 3
4 r B 2 1
5 s C 4 3
6 q A 3 开发者_如何学C 3
6 s C 4 3
Then group by identifier and assign a group number. The groups, however, need to be combined where possible to provide the view shown in end group.
The only solution I can think of for this problem is to use loops, which I'd rather avoid.
Any ideas would be greatly appreciated.
Niall
This is truly an interesting problem. Still, I think we are missing a definition of a "group". Since in your example (p,null,null)
(row1) and (null,r,null)
(row4) share no common identifier and belong to the same group I'll go with this definition for grouping:
A row belongs to a group if it shares at least one identifier with at least one row of this group.
This means we can "chain" rows. This naturally leads to a hierarchical solution:
SQL> SELECT ID, a, b, c, MIN(grp) grp
2 FROM (SELECT connect_by_root(id) ID,
3 connect_by_root(a) a,
4 connect_by_root(b) b,
5 connect_by_root(c) c,
6 ID grp
7 FROM a
8 CONNECT BY NOCYCLE(PRIOR a = a
9 OR PRIOR b = b
10 OR PRIOR c = c))
11 GROUP BY ID, a, b, c
12 ORDER BY ID;
ID A B C GRP
---------- ---------- ---------- ---------- ----------
1 p 1
2 p r 1
3 q 3
4 r 1
5 s 3
6 q s 3
6 rows selected
You can execute the subquery to understand the construction:
SQL> SELECT connect_by_root(id) ID,
2 connect_by_root(a) a,
3 connect_by_root(b) b,
4 connect_by_root(c) c,
5 substr(sys_connect_by_path(ID, '->'), 3) path,
6 ID grp
7 FROM a
8 CONNECT BY NOCYCLE(a = PRIOR a
9 OR b = PRIOR b
10 OR c = PRIOR c);
ID A B C PATH GRP
---------- ---------- ---------- ---------- -------- ----------
1 p 1 1
1 p 1->2 2
1 p 1->2->4 4
2 p r 2 2
2 p r 2->1 1
2 p r 2->4 4
3 q 3 3
3 q 3->6 6
3 q 3->6->5 5
4 r 4 4
4 r 4->2 2
4 r 4->2->1 1
5 s 5 5
5 s 5->6 6
5 s 5->6->3 3
6 q s 6 6
6 q s 6->3 3
6 q s 6->5 5
18 rows selected
Use merge instead of loop:
Table a(a,b,c,groupId)
Statement:
merge into a
USING (SELECT RANK() OVER(ORDER BY a,b,c) g, ROWID rid FROM a) SOURCE
ON (a.ROWID = SOURCE.rid)
WHEN MATCHED THEN UPDATE SET a.GroupId = SOURCE.g
It is same as:
BEGIN
FOR x IN ( SELECT RANK() OVER(ORDER BY a,b,c) g, ROWID rid FROM a)
LOOP
UPDATE a
SET GroupId = x.g
WHERE a.RowId = x.rid;
END LOOP;
END;
精彩评论