With the following table
CREATE TABLE T1 (
A varchar(2),
B varchar(2)
);
INSERT INTO T1 VALUES
('aa', 'm'), ('aa', 'n'),
('bb', 'n'), ('bb', 'o'),
('cc', 'n'), ('cc', 'o'),
('dd', 'c'), ('dd', 'a'), ('dd', 'r'),
('ee', 'a'), ('ee', 'c'), ('ee', 'r')
A | B
----+----
aa | m
aa | n
bb | n
bb | o
cc | n
cc | o
dd | c
dd | a
dd | r
ee | a
ee | c
ee | r
How can I select and group the values in A that match all the coresponding values in B. For example bb and cc ma开发者_开发百科ke up a group because they both contain 'n' and 'o'.
So the result would be
Group | A
----------
1 | bb
1 | cc
2 | dd
2 | ee
Here's one approach: it first calculates the matching "sets", where a set is a group of two A
's which match. Then it calculate the "head", or the lowest A
for sets in the same group. Using dense_rank
you can number the heads, and then join back on the list of sets to create a list of all set members.
Query at SE Data.
; with groups as
(
select distinct A
from @t
)
, vals as
(
select distinct B
from @t
)
, sets as
(
select g1.A as g1
, g2.A as g2
from groups g1
join groups g2
on g1.A < g2.A
cross join
vals v
left join
@t v1
on g1.A = v1.A
and v.B = v1.B
left join
@t v2
on g2.A = v2.A
and v.B = v2.B
group by
g1.A
, g2.A
having count(case when isnull(v1.B,'') <> isnull(v2.B,'') then 1 end) = 0
)
, heads as
(
select s1.g1
, s1.g2
, head.head
from sets s1
cross apply
(
select min(g1) as head
from sets s2
where s1.g2 = s2.g2
) head
)
select distinct dense_rank() over (order by h.head)
, g.g
from (
select distinct head
from heads
) h
left join
(
select g1 as g
, head
from heads
union all
select g2
, head
from heads
) g
on h.head = g.head
SQL Server 2008 has the EXCEPT
and INTERSECT
functions, which could be used. This isn't in exactly the format that you wanted and I can't speak to performance against large data sets, but perhaps it will give you a starting point.
SELECT DISTINCT
T1.A,
T2.A
FROM
T1 AS T1
INNER JOIN T1 AS T2 ON T2.A > T1.A
WHERE
NOT EXISTS
(
SELECT
B
FROM
T1 AS T3
WHERE
T3.A = T1.A
EXCEPT
SELECT
B
FROM
T1 AS T4
WHERE
T4.A = T2.A
) AND
NOT EXISTS
(
SELECT
B
FROM
T1 AS T3
WHERE
T3.A = T2.A
EXCEPT
SELECT
B
FROM
T1 AS T4
WHERE
T4.A = T1.A
)
Depending on your data, you could also generate some concatenated strings with delimiters and specific ordering within the string and then compare those.
The relational operator you require is division, popularly known as "the supplier who supplies all parts".
Actually, division comes in about eight flavours and the SQL language has implemented none of them directly. However, they can all be recreated using existing SQL constructs: see this article for the more popular ones. Things to consider include: exact division or with remainder; how to handle an empty divisor.
精彩评论